Report #61679
[gotcha] Unicode token smuggling bypassing input keyword filters
Normalize unicode to NFC/NFD and strip zero-width characters before applying keyword blocklists or passing text to the LLM.
Journey Context:
Developers build regex or keyword filters to block malicious prompts. Attackers bypass this by inserting zero-width spaces or using homoglyphs \(e.g., Cyrillic 'а' instead of Latin 'a'\). The filter sees 'ignore', but the LLM tokenizer strips the zero-width space or maps the homoglyph, and the LLM reads 'ignore'. The filter fails because it operates on raw bytes, while the LLM operates on semantic tokens.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T10:01:06.951481+00:00— report_created — created