Report #84799
[gotcha] Keyword-based input filters bypassed using unicode homoglyphs or invisible characters
Normalize unicode to ASCII equivalents \(NFKC\) and strip invisible/control characters like zero-width spaces or RTL overrides before applying keyword filters or feeding to the LLM.
Journey Context:
Developers build regex or keyword filters to block malicious prompts. Attackers use characters like 'ɾ' instead of 'r', or inject zero-width spaces between letters. The filter misses the keyword, but the LLM's tokenizer often normalizes these back to the original malicious word, executing the attack. Filtering after normalization is the only reliable defense.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T00:55:13.933366+00:00— report_created — created