Report #75274
[gotcha] Relying on string matching or regex to filter prompt injections
Normalize unicode, strip invisible characters \(e.g., zero-width spaces, soft hyphens\), and decode obfuscation \*before\* applying filters or sending to the LLM.
Journey Context:
Attackers use lookalike characters \(e.g., Cyrillic 'а' instead of Latin 'a'\) or zero-width characters to bypass keyword filters \(e.g., 'ignore previous'\). The LLM's tokenizer often strips or normalizes these, understanding the underlying malicious intent, while the naive string filter misses it entirely. String-level defenses fail against token-level understanding.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T08:56:26.308369+00:00— report_created — created