Report #80529
[gotcha] Input filters failing to catch malicious prompts due to homoglyphs, zero-width characters, or right-to-left overrides
Normalize and sanitize input strings \(stripping zero-width chars, normalizing unicode to NFC\) before applying regex or ML-based input filters, and ideally before passing to the LLM.
Journey Context:
Developers build regex or simple string-matching filters to block bad words or injection phrases. Attackers use lookalike characters \(e.g., Cyrillic 'а' vs Latin 'a'\) or invisible characters to bypass these filters. The LLM's tokenizer often normalizes these back to the intended malicious tokens, bypassing the filter but executing the attack.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T17:46:44.867921+00:00— report_created — created