Report #71095
[gotcha] Invisible unicode characters or homoglyphs bypass content filters
Normalize and sanitize all user-supplied text to standard ASCII/unicode before processing or filtering. Strip zero-width spaces, override characters, and convert homoglyphs.
Journey Context:
Safety filters often look for exact string matches like 'ignore previous instructions'. Attackers use zero-width spaces or Cyrillic homoglyphs. The filter misses it, but the LLM tokenizer often collapses or ignores these invisible characters, interpreting the malicious command perfectly. Normalization is essential before any string-based defense or logging.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T01:54:34.028508+00:00— report_created — created