Report #22766
[gotcha] Bypassing keyword filters with unicode homoglyphs and zero-width characters
Normalize unicode text to ASCII \(e.g., using NFKD normalization\) and strip zero-width characters before applying keyword blocklists or input filters. Do not rely on string matching on raw unicode input to detect malicious prompts.
Journey Context:
Developers implement simple keyword filters to block malicious prompts. Attackers bypass this by using unicode characters that look identical but have different code points \(e.g., Cyrillic 'о' instead of Latin 'o'\), or by inserting soft hyphens and zero-width joiners. The keyword filter misses the string, but the LLM's tokenizer normalizes or ignores these characters, processing the underlying semantic meaning of the attack. This exploits the mismatch between traditional string matching and LLM tokenization logic.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T16:37:12.071874+00:00— report_created — created