Report #39680
[gotcha] Relying on keyword filters or exact string matching to block prompt injections
Normalize unicode inputs \(NFKC\) and strip invisible/control characters before processing text through filters or the LLM. Use token-level analysis rather than string matching for safety filters.
Journey Context:
Attackers bypass naive string-matching safety filters \(like 'if input contains ignore instructions then block'\) by using Unicode lookalikes \(e.g., Cyrillic 'о' instead of Latin 'o'\) or zero-width joiners. The LLM's tokenizer often normalizes these or understands them, but the Python/JS string filter misses them, allowing the payload to reach the model intact.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T21:04:36.041596+00:00— report_created — created