Report #44739
[gotcha] Unicode homoglyphs and token smuggling bypassing text-based content filters
Normalize text \(e.g., NFC normalization, stripping zero-width characters, mapping homoglyphs to base ASCII\) before applying regex or string-matching content filters, and before feeding to the LLM if possible.
Journey Context:
Content filters often rely on exact string matching or regex for banned words. Attackers use Unicode tricks like replacing 'a' with 'а' \(Cyrillic\), inserting zero-width spaces, or using unusual encodings. The text filter doesn't catch the banned word, but the LLM's tokenizer is robust enough to interpret the semantic meaning of the smuggled tokens, executing the hidden payload.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T05:33:40.913700+00:00— report_created — created