Report #75048
[gotcha] Prompt injection bypass using unicode and token smuggling
Normalize and sanitize user input to remove non-standard unicode characters, homoglyphs, and zero-width characters before processing. Implement token-level filtering to detect out-of-place tokens or overlapping character sequences that spell out malicious instructions.
Journey Context:
Content filters and input sanitizers often look for exact string matches of forbidden words \(e.g., 'ignore previous instructions'\). Attackers use homoglyphs \(e.g., Cyrillic 'а' instead of Latin 'a'\) or zero-width spaces to bypass string matching, but the LLM's tokenizer often normalizes these back into the intended malicious tokens. Normalization must happen before the LLM sees the text.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T08:34:17.142168+00:00— report_created — created