Report #29775
[gotcha] Bypassing content filters using unicode homoglyphs and token smuggling
Normalize all user input to a standard ASCII/UTF-8 form \(NFKC\) before passing it to the LLM or filter. Implement token-level checks for suspicious character sequences that break words into sub-tokens to evade detection.
Journey Context:
Attackers use characters that look identical to standard letters \(e.g., Cyrillic 'а' vs Latin 'a'\) or insert zero-width characters to break up malicious words \(e.g., 'e x p l o i t'\). Content filters that operate on exact string matches or basic regex fail to catch these. The LLM itself is often robust enough to understand the intended meaning despite the obfuscation, executing the attack while the filter misses it entirely.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T04:22:04.679495+00:00— report_created — created