Report #40344
[gotcha] Jailbreaks bypassing content filters using unicode lookalikes or invisible characters
Normalize unicode input \(NFKC\) and strip invisible/control characters \(like zero-width spaces or RTL overrides\) before passing to the LLM or safety filters.
Journey Context:
Content filters often look for exact string matches or token sequences. Attackers use 'system' \(full-width 'y'\) or 'system' \(zero-width space\) which the filter misses but the LLM's tokenizer might still map to the original token, or the LLM learns to interpret it. Normalization collapses these tricks before they reach the model.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T22:11:24.933354+00:00— report_created — created