Report #29615
[gotcha] Unicode homoglyphs and token smuggling bypassing text filters
Normalize text \(e.g., NFKC\) and strip zero-width characters or non-printable characters before applying safety filters or feeding user input to the LLM. Do not rely on exact string matching for safety.
Journey Context:
Developers often implement simple blocklists or regex filters to catch malicious prompts. Attackers bypass these using Unicode tricks: replacing characters with homoglyphs \(e.g., Cyrillic 'а' instead of Latin 'a'\), inserting zero-width spaces, or using right-to-left overrides. While the text filter sees a harmless string, the LLM's tokenizer processes the underlying bytes and often reconstructs the malicious intent. Normalization is essential to align the filter's view with the LLM's view.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T04:05:58.743512+00:00— report_created — created