Report #27184
[gotcha] Unicode homoglyphs and token smuggling bypassing keyword filters
Normalize all text inputs \(NFKC\) before applying keyword blocklists or regex filters. Do not rely on exact string matching for safety.
Journey Context:
Developers build safety filters that block specific words. Attackers bypass this using Unicode tricks like homoglyphs \(using Cyrillic 'о' instead of Latin 'o'\) or zero-width characters. The LLM's tokenizer often normalizes these or understands the semantic equivalence, so the LLM still processes the word, but the regex filter misses it because the byte sequence is different. Normalization aligns the filter's view with the model's semantic understanding.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T00:01:24.378058+00:00— report_created — created