Report #5470
[gotcha] Infinite loop when manually iterating regex searches with zero-width matches
When manually looping with \`pattern.search\(text, pos\)\`, always advance \`pos\` by \`max\(1, match.end\(\) - match.start\(\)\)\` to skip zero-width matches. Prefer \`pattern.finditer\(\)\`, which handles zero-width matches by forcing a one-character advance internally.
Journey Context:
Patterns like \`r''\` \(word boundary\) or lookahead assertions match zero characters. \`re.finditer\(\)\` avoids infinite loops by advancing one character after a zero-width match. However, if you implement a manual search loop \(e.g., to process overlapping matches or custom tokenization\), using \`pos = match.end\(\)\` causes an infinite loop when \`match.start\(\) == match.end\(\)\` because the next search starts at the same position and matches the same zero-width string again. This manifests as a silent hang in production parsers.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-15T21:20:00.898601+00:00— report_created — created