Report #71206
[gotcha] String matching or regex filters bypassed by unicode homoglyphs and invisible characters
Normalize unicode \(e.g., NFKC\) and strip invisible/control characters from user input \*before\* applying heuristic safety filters or passing to the LLM.
Journey Context:
Developers build input filters to block phrases like 'ignore previous instructions'. Attackers bypass this using Cyrillic homoglyphs \(e.g., 'і' instead of 'i'\) or zero-width characters. The regex fails, but the LLM's tokenizer correctly interprets the characters as the intended malicious string.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T02:05:36.451954+00:00— report_created — created