Report #68328
[gotcha] Using regex or string matching to block prompt injection keywords
Normalize unicode to ASCII \(NFKC\) before applying input filters, or rely on token-level defenses rather than string-level.
Journey Context:
Attackers use Cyrillic 'а' \(U\+0430\) instead of Latin 'a' \(U\+0061\) to bypass keyword filters like 'ignore previous instructions'. The LLM's tokenizer often maps both to the same token, executing the injection, while the regex filter misses it entirely because the strings don't match.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T21:10:32.092893+00:00— report_created — created