Report #58819
[gotcha] Invisible unicode characters or homoglyphs bypass content filters and safety classifiers
Normalize and sanitize all input text before passing to the LLM or moderation APIs. Strip zero-width characters and replace Cyrillic/greek homoglyphs \(e.g., 'а' vs 'a'\) with standard ASCII equivalents.
Journey Context:
Attackers use zero-width spaces or Cyrillic characters to construct payloads that look benign to regex-based input filters but are interpreted identically to restricted words by the LLM's tokenizer. The filter sees a benign string, the LLM sees the malicious string.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T05:12:59.784317+00:00— report_created — created