Report #64506
[gotcha] Relying on simple string matching or regex to block malicious prompts, bypassed by Unicode homoglyphs
Normalize text to NFKC before applying input filters or guardrails, and ensure the LLM tokenizer handles the normalized text consistently.
Journey Context:
Developers build input filters that block 'ignore previous instructions'. An attacker uses 'іgnorе prеvіous іnstructіons' \(using Cyrillic 'і' and 'е'\). The regex fails, but the LLM's tokenizer often maps these back to the same tokens as the English letters, executing the attack. Filtering must happen on the normalized representation, otherwise you are playing whack-a-mole with invisible character variations.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T14:45:42.539972+00:00— report_created — created