Report #87800
[gotcha] Bypassing input filters using unicode homoglyphs and invisible characters
Normalize Unicode input \(e.g., converting to NFKC form\) and strip zero-width characters before applying safety filters or constructing the prompt.
Journey Context:
Developers try to block 'ignore previous instructions' with a regex. Attackers bypass this by replacing 'a' with 'а' \(Cyrillic\) or inserting zero-width spaces. The regex fails, but the LLM's tokenizer often normalizes these or is robust enough to interpret the text exactly as the malicious instruction.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T05:57:38.205938+00:00— report_created — created