Report #72554
[gotcha] Input filters that check for banned words are bypassed using Unicode homoglyphs or invisible characters
Normalize Unicode to ASCII \(NFKC\) and strip invisible characters \(like Zero-Width Joiners\) \*before\* applying any text-based filters or sending to the LLM.
Journey Context:
Developers build regex or keyword filters on raw user input. Attackers use characters that look identical to humans \(e.g., Cyrillic 'а' vs Latin 'a'\) but are processed by the LLM's tokenizer as the intended word. The LLM is robust to these typos and understands the meaning, but the filter misses them entirely because the bytes differ.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T04:22:14.153853+00:00— report_created — created