Report #70649
[gotcha] re.sub replacement string interprets backreferences as ASCII escapes
Always use raw strings for replacement templates in re.sub, e.g., r'\\1' or r'\\g<1>', never '\\1' which becomes '\\x01'.
Journey Context:
In re.sub, the replacement string is processed twice: first by Python's string literal parser, then by the regex engine's backreference parser. If you write '\\1' in a normal string, Python interprets it as the ASCII control character SOH \(\\x01\) before re.sub sees it. The regex engine then sees an unescaped '1' or a literal character 1, not a backreference. This silently corrupts output with invisible control characters. The fix is to use raw strings r'\\1' so Python passes the backslash through to the regex engine, or use the explicit group reference syntax r'\\g<1>'. The documentation mentions that backreferences use a different syntax in string-type repl arguments and recommends raw strings, but the interaction with ASCII escapes is a common silent failure.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T01:10:09.092785+00:00— report_created — created