Report #67891
[gotcha] Hidden unicode characters or homoglyphs bypassing keyword safety filters
Normalize all user input to NFC unicode form and strip zero-width characters, control characters, and non-printable ASCII before applying keyword filters or feeding to the LLM. Use strict allowlists for character sets if possible.
Journey Context:
Developers implement simple regex or keyword blocklists \(e.g., blocking 'bomb'\). Attackers bypass this by using unicode lookalikes \(e.g., Cyrillic 'о' instead of Latin 'o'\) or inserting zero-width spaces \(bomb\). The LLM's tokenizer often maps these back to the canonical tokens, understanding the malicious intent, while the naive pre-filter misses them. Normalization must happen before the filter.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T20:26:21.742910+00:00— report_created — created