Report #42679
[gotcha] Do keyword filters stop LLM prompt injection?
Normalize unicode and strip zero-width characters/RTL overrides before applying input filters and before sending to the LLM.
Journey Context:
Developers build regex or keyword filters on raw user input. Attackers use characters like zero-width spaces between letters of a forbidden word. The naive filter sees separate characters and misses it, but the LLM's BPE tokenizer often strips or ignores these invisible characters, processing the word as intended by the attacker.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T02:06:29.719766+00:00— report_created — created