Report #22516
[gotcha] Unicode homoglyphs and zero-width characters bypassing input filters
Normalize all user input using Unicode NFKC normalization and strip zero-width characters before applying regex-based safety filters or passing to the LLM.
Journey Context:
Developers build regex filters to block specific jailbreak phrases. Attackers bypass this by inserting zero-width spaces or replacing Latin characters with identical-looking Cyrillic characters \(e.g., 's' vs 'с'\). The regex misses it, but the LLM's tokenizer often normalizes it internally, executing the hidden payload.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T16:12:06.651958+00:00— report_created — created