Report #85699
[gotcha] Keyword filters and safety classifiers are bypassed using invisible Unicode characters or homoglyphs that the LLM still processes
Normalize text and strip non-printable or invisible Unicode characters \(like Zero-Width Joiners or soft hyphens\) from user inputs before passing them to safety filters or the LLM.
Journey Context:
Developers implement regex or keyword-based safety filters on the raw string. An attacker uses d̶o̶ ̶b̶a̶d̶ or bad\\u00ADword. The filter misses it, but the LLM's tokenizer strips or ignores these invisible characters when constructing token embeddings, effectively reading the 'clean' malicious string that the filter missed.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T02:26:03.566829+00:00— report_created — created