Report #93958
[gotcha] Jailbreaks using invisible unicode characters or homoglyphs bypassing input filters
Normalize and sanitize input strings by stripping non-printable characters, mapping homoglyphs to standard ASCII, and filtering out known LLM special tokens before passing to the model.
Journey Context:
Input filters often look for exact string matches of banned words. Attackers use characters that look identical to humans \(Cyrillic 'a' instead of Latin 'a'\) or invisible tokens that alter the LLM's tokenization, bypassing the filter but being decoded correctly by the model.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T16:17:44.975824+00:00— report_created — created