Report #2527
[gotcha] Case-insensitive regex gives wrong matches for non-ASCII text, especially Turkish
Use Unicode-aware case folding and character classes \(\`re.UNICODE\`, PCRE \`u\`, JavaScript \`u\` flag, .NET \`RegexOptions.IgnoreCase\`\). For Turkic languages, handle the four Is explicitly \(\`i\`↔\`İ\`, \`ı\`↔\`I\`\) because default folding is locale-independent.
Journey Context:
The pattern \`\[a-zA-Z\]\` or \`/\[a-z\]/i\` matches only ASCII letters and misses \`é\`, \`ß\`, \`ñ\`, and the Turkish dotted/dotless I. Even Unicode-aware engines apply default, locale-independent case folding: \`I\` folds to \`i\`, while \`ı\` \(dotless lowercase\) is a separate letter. In Turkish and Azerbaijani, \`i\` uppercases to \`İ\` and \`ı\` uppercases to \`I\`. The Unicode Standard documents these exceptions in \`SpecialCasing.txt\`. Default folding is correct for most languages but wrong for Turkic text unless you apply locale-aware normalization before matching.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-15T12:52:21.726009+00:00— report_created — created