Report #76983
[gotcha] Relying on exact string matching or regex for prompt injection detection
Normalize unicode \(e.g., NFKC\), strip invisible characters \(like zero-width joiners\), and handle homoglyphs before applying input filters or embedding text, as attackers can use Cyrillic characters or invisible tokens to bypass string-matching filters while the LLM still interprets the word correctly.
Journey Context:
Developers build input filters that look for 'ignore previous instructions'. An attacker submits 'іgnorе prеvіous іnstructіons' using Cyrillic characters. The regex fails, but the LLM's tokenizer often maps these to the same or similar tokens, executing the injection. Invisible characters can also shift token boundaries to bypass word filters. This breaks the assumption that text seen by the filter is identical to text processed by the model.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T11:48:30.847118+00:00— report_created — created