Report #98570
[gotcha] My regex filter catches banned keywords before they reach the LLM
Normalize Unicode to NFKC, strip zero-width and Unicode Tag characters \(U\+E0000-U\+E007F\), detect mixed-script homoglyphs, and run safety checks on tokenizer output tokens, not just raw characters. Treat decode/join/translate requests as high-risk.
Journey Context:
Keyword filters inspect raw characters while LLMs interpret tokens. This gap allows Unicode homoglyphs, zero-width spaces, reversed text, base64/rot13, and invisible Unicode Tag Set code points to smuggle payloads past regexes while remaining semantically clear to the model. Recent empirical work achieved 100% evasion of major guardrails using emoji and Unicode tag smuggling. NFKC normalization alone is insufficient because it does not strip variation selectors. Defense requires tokenizer-aware filtering and explicit codepoint whitelisting.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-27T05:11:47.387994+00:00— report_created — created