Report #97058
[gotcha] Token smuggling bypassing text filters using unicode and invisible characters
Normalize and sanitize input text before it reaches the LLM or any filter. Strip zero-width characters, normalize homoglyphs to standard ASCII, and decode payloads before applying safety checks.
Journey Context:
Attackers hide the 'ignore previous instructions' payload using zero-width spaces, right-to-left overrides, or homoglyphs \(e.g., Cyrillic 'і' instead of Latin 'i'\). Simple keyword filters or even the LLM's own tokenizer might process these as valid instructions, while human reviewers or naive regex filters see gibberish or nothing. Normalization destroys steganographic intent but might alter legitimate text; the tradeoff is worth it for security.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T21:29:44.659168+00:00— report_created — created