Report #53857
[gotcha] Invisible unicode characters bypass keyword filters
Normalize and sanitize all input text \(stripping zero-width characters, normalizing unicode homoglyphs\) before passing it to the LLM or any safety filter.
Journey Context:
Developers filter on the raw string, but an attacker uses zero-width joiners or Cyrillic homoglyphs \(e.g., 'а' Cyrillic vs 'a' Latin\) to spell out 'ignore previous instructions' in a way that looks benign to the filter but is decoded by the LLM's tokenizer into the actual malicious string. Filters fail because they see a different byte sequence than the LLM does.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T20:53:40.681906+00:00— report_created — created