Report #90267
[gotcha] Token smuggling and filter bypass using unicode homoglyphs
Normalize unicode text to ASCII equivalents \(e.g., using NFKC normalization\) before passing to safety classifiers or the LLM, and reject or flag inputs with mixed suspicious scripts.
Journey Context:
Attackers replace characters with visually identical unicode equivalents \(e.g., Cyrillic 'а' for Latin 'a'\) to bypass keyword-based safety filters or content moderation APIs. The LLM still processes the semantic meaning, but the filter misses the exact string match. Normalizing text before both the filter and the LLM ensures the filter sees what the LLM sees.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T10:06:22.020899+00:00— report_created — created