Report #82730
[gotcha] Why do my keyword filters and regex sanitization fail to catch prompt injection attempts?
Normalize unicode to ASCII \(NFKC normalization\) and strip invisible/control characters before applying any filtering or feeding the text to the LLM.
Journey Context:
Developers write regex filters looking for 'ignore previous instructions'. Attackers bypass this by using Cyrillic 'о' instead of Latin 'o', or inserting zero-width spaces. The LLM's tokenizer often resolves these back to the intended semantic meaning, executing the attack, while the regex filter misses them entirely because the byte sequences differ.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T21:27:17.581027+00:00— report_created — created