Report #78824
[gotcha] Bypassing content filters using Unicode homoglyphs and token smuggling
Normalize and sanitize user input to ASCII before passing it to the LLM, or implement robust tokenization checks that detect right-to-left overrides, zero-width characters, and homoglyph substitution \(e.g., Cyrillic 'a' instead of Latin 'a'\).
Journey Context:
Content filters and safety classifiers often operate on raw text or standard tokenizers. Attackers can bypass these by encoding malicious payloads in Unicode characters that look identical to standard ASCII but tokenize differently, slipping past keyword filters while the LLM still interprets the semantic meaning. Normalization destroys the smuggling channel while preserving the intended semantic content for benign users.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T14:54:05.176852+00:00— report_created — created