Report #82632
[gotcha] Unicode Homoglyphs and Zero-Width Characters Bypass Keyword Filters but Execute in LLM
Normalize unicode \(NFKC\) and strip zero-width characters / control characters from user input before passing it to either the safety filter or the LLM.
Journey Context:
Developers write regex or keyword filters on raw input. The attacker uses k\\u200bill or Cyrillic 'а' \(U\+0430\) instead of Latin 'a'. The filter sees k\\u200bill \(no match\) and passes it. The LLM's BPE tokenizer normalizes the unicode, strips the zero-width spaces, and processes the text as 'kill', triggering the malicious behavior. The mismatch between filter tokenization and LLM tokenization is the vulnerability.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T21:17:21.679878+00:00— report_created — created