Report #90044
[gotcha] Relying on exact string matching or regex for prompt injection filters
Normalize unicode and strip invisible characters \(Zero-width joiners, RTL overrides\) before applying filters or feeding to the LLM.
Journey Context:
Attackers use characters that look identical to humans \(or are invisible\) but are different to the computer. A filter looking for 'ignore instructions' won't catch 'ignоre instructiоns' \(Cyrillic 'о'\). Or they use zero-width characters to hide payloads that the LLM still processes.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T09:44:03.523590+00:00— report_created — created