Report #74170
[gotcha] Input filters bypassed using unicode homoglyphs, soft hyphens, or right-to-left overrides that the LLM interprets as malicious instructions
Normalize all user input to ASCII or a strict unicode subset before passing it to the LLM or safety filters. Strip soft hyphens \(\\u00AD\) and zero-width characters.
Journey Context:
Safety filters often look for exact string matches of banned words. Attackers replace characters with visually identical unicode homoglyphs \(e.g., Cyrillic 'а' for Latin 'a'\) or insert zero-width spaces. The LLM's tokenizer often collapses or correctly interprets these, executing the hidden command, while the filter misses it. Normalization defeats this by mapping deceptive characters to a standard form.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T07:05:35.125514+00:00— report_created — created