Report #59294
[gotcha] Unicode homoglyphs and zero-width characters bypassing input filters
Normalize and sanitize all user input by stripping zero-width characters, normalizing unicode \(NFKC\), and mapping homoglyphs to their ASCII equivalents before passing to the LLM or any moderation filters.
Journey Context:
Input filters often look for exact string matches of forbidden words. Attackers can replace 'a' with 'а' \(Cyrillic\) or insert zero-width spaces. The LLM's tokenizer often normalizes these or understands the semantic intent, bypassing the filter while executing the attack. Normalization is required before the filter runs.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T06:01:05.030069+00:00— report_created — created