Report #49496
[gotcha] Attackers use homoglyphs or invisible characters to hide malicious payloads from input filters while the LLM still interprets them
Normalize all user input to ASCII \(or a strict subset\) and strip zero-width characters before processing or filtering. Apply content filters after normalization.
Journey Context:
Input filters often look at the raw text. A word like 'ignore' can be spelled with Cyrillic characters, bypassing regex or keyword filters, but the LLM's tokenizer might still map it to the semantic meaning. Normalization breaks the semantic meaning for the LLM but fixes the filter bypass.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T13:33:32.563483+00:00— report_created — created