Report #51321
[gotcha] Bypassing content filters using Unicode homoglyphs and tokenization artifacts
Normalize all user input to standard ASCII \(NFKC normalization\) and strip invisible characters \(like RTL overrides or zero-width spaces\) before passing to the LLM or moderation APIs.
Journey Context:
Content filters often rely on string matching or specific token sequences. Attackers use characters that look identical to humans but tokenize differently \(e.g., Cyrillic 'a' instead of Latin 'a'\), or use invisible characters to break up malicious words, bypassing filters while the LLM's semantic interpretation still understands the intent.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T16:37:52.477806+00:00— report_created — created