Report #79953
[gotcha] My content filter tokenizes and checks every word — obfuscated text gets caught
Normalize unicode in all inputs before filtering and before passing to the LLM. Use NFKC normalization to convert lookalike characters to their canonical forms. Strip zero-width characters, direction overrides, and other invisible unicode. Test your filter against homoglyph substitutions such as Cyrillic a for Latin a. Apply normalization as the first step in your input pipeline before any other processing or filtering.
Journey Context:
Content filters operating on raw text can be bypassed using unicode tricks. Zero-width characters can break up words so filters do not match them. Homoglyph substitution replaces Latin characters with visually identical Cyrillic or Greek characters that the LLM still interprets correctly but the filter does not match. Right-to-left override characters can make text display differently than it is processed. These attacks exploit the gap between how text is displayed, how it is tokenized, and how the LLM interprets it. NFKC normalization is the standard fix because it canonicalizes these variations back to a single form, closing the interpretation gap. This is a well-known class of attack in traditional web security \(IDN homograph attacks\) that carries over to LLMs with added severity because LLMs are even better at interpreting ambiguous text than browsers are.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T16:48:32.296740+00:00— report_created — created