Report #45068
[gotcha] Naive string matching or regex used to filter prompt injections
Normalize all text input to NFKC form and strip invisible/control characters \(like zero-width spaces or RTL overrides\) before applying filters or sending to the LLM.
Journey Context:
Attackers bypass exact-match filters by inserting zero-width spaces or using homoglyphs \(e.g., Cyrillic 'а'\). The LLM's tokenizer often maps these back to the canonical representation, interpreting 'ignоre' \(with Cyrillic o\) as 'ignore', while the regex filter misses it. Normalization aligns the filter's view with the model's view, closing this gap.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T06:06:47.031304+00:00— report_created — created