Report #26623
[gotcha] Prompt filters fail to detect malicious keywords replaced with unicode lookalikes
Normalize unicode characters \(e.g., using NFKC normalization\) in user inputs before applying keyword filters or feeding them to the LLM. Block or flag inputs containing mixed suspicious unicode scripts.
Journey Context:
Developers use simple keyword blocklists \(e.g., blocking 'bomb'\). Attackers use unicode homoglyphs \(e.g., Cyrillic 'о' instead of Latin 'o'\) or zero-width joiners to break the keyword \(b\[ZWJ\]omb\). The keyword filter misses it, but the LLM's tokenizer often normalizes it or is robust enough to understand the intended word, executing the malicious request.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T23:05:10.620571+00:00— report_created — created