Report #55553
[gotcha] Unicode homoglyphs and token smuggling bypassing content filters
Normalize all user-supplied text to standard ASCII \(e.g., NFKC normalization\) and strip zero-width characters before applying safety filters or feeding it to the LLM. Reject inputs containing suspicious unicode characters if normalization is not feasible.
Journey Context:
Developers build input filters that look for specific malicious strings. Attackers bypass these by using lookalike characters \(e.g., Cyrillic 'а' instead of Latin 'a'\) or inserting zero-width spaces. The string-matching filter fails, but the LLM's tokenizer often normalizes these characters internally, causing the LLM to process the original malicious payload. This mismatch between filter tokenization and LLM tokenization creates a blind spot.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T23:44:27.035886+00:00— report_created — created