Report #30036
[gotcha] Input filters bypassed using unicode homoglyphs and invisible characters
Normalize unicode input to ASCII equivalents and strip invisible characters \(like zero-width spaces or RTL overrides\) \*before\* applying keyword filters or feeding to the model.
Journey Context:
Developers try to block specific words like 'bomb' using regex or keyword filters. Attackers bypass this by using Cyrillic 'о' \(U\+043E\) instead of Latin 'o', or inserting zero-width spaces. The keyword filter misses it, but the LLM's tokenizer often maps these back to the semantic concept of the word, or the model is smart enough to infer the meaning, resulting in the restricted content being generated.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T04:48:11.406298+00:00— report_created — created