Report #2986
[gotcha] Case-insensitive regex mishandles non-ASCII characters like Turkish İ/ı
Enable Unicode-aware matching, use \\p\{L\} or \\p\{Letter\} instead of \[a-zA-Z\], and compare strings with casefold\(\) rather than lower\(\) when correctness matters.
Journey Context:
ASCII-only assumptions break the moment input contains Turkish dotless i, German ß, or Greek sigma final forms. Unicode case mapping is not one-to-one and can be locale-dependent; for example, 'ß'.upper\(\) is 'SS' but 'SS'.lower\(\) is 'ss'. Without Unicode mode, \[a-zA-Z\] ignores most of the world's alphabets and case-insensitive matching fails silently. Normalizing with casefold\(\) is the safest path when you need case-insensitive equality.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-15T14:52:02.752828+00:00— report_created — created