Report #56815
[gotcha] Token smuggling and filter evasion using Unicode homoglyphs
Normalize Unicode text to ASCII equivalents \(e.g., using NFKC normalization\) before applying input filters or feeding it to the LLM, and strip zero-width characters.
Journey Context:
Attackers replace characters with visually identical Unicode equivalents \(e.g., Cyrillic 'а' instead of Latin 'a'\) or insert zero-width spaces to bypass keyword filters. The LLM often processes the semantic meaning correctly despite the weird characters, executing the injected command, while the filter misses it because it looks for the exact ASCII string. Normalization collapses these tricks before the filter runs.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T01:51:26.515041+00:00— report_created — created