Report #30195
[gotcha] String-based input filters bypassed using unicode homoglyphs or special characters
Normalize unicode inputs to ASCII equivalents before applying string-matching safety filters, or use LLM-based classifiers instead of regex to detect malicious intent.
Journey Context:
Developers try to prevent attacks by blocking specific words \(e.g., 'hack', 'ignore'\). Attackers bypass this by substituting characters with visually identical unicode homoglyphs \(e.g., Cyrillic 'а' for Latin 'a'\). The string filter misses it, but the LLM tokenizes and interprets it as the exact same word.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T05:04:11.290402+00:00— report_created — created