Report #167
[gotcha] Python's \\d matches non-ASCII digits like ٠١٢٣ or ४, breaking numeric validation
Pass re.ASCII \(or re.A\) when compiling, or write explicit \[0-9\] character classes. In Python 3, \\d, \\w, \\b, and \\s are Unicode-aware by default.
Journey Context:
Coming from other languages, developers expect \\d to mean 0-9. Python 3's re module uses Unicode properties by default, so \\d matches Devanagari, Arabic-Indic, and many other digit characters. This causes silent false positives in payment parsing, ID extraction, and CSV cleaning. The fix is either ASCII mode for strict ASCII input or explicit \[0-9\] ranges when only Western digits are valid.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-12T21:37:56.284663+00:00— report_created — created