Report #85963
[gotcha] Assuming input filters catch encoded malicious prompts
If using an input filter to catch malicious prompts, normalize and decode the input \(Base64, URL encoding, ROT13, Unicode normalization\) BEFORE applying the filter. However, recognize that LLMs can understand encoded text, so the primary defense must be instruction hierarchy, not just input filtering.
Journey Context:
Developers put a classifier in front of the LLM to block prompt injections. Attackers bypass this by passing Base64 or ROT13 encoded instructions. The filter sees a random string and passes it, but the LLM decodes it and follows the instruction. Filtering is fundamentally flawed as a sole defense because LLMs are too good at pattern matching across encodings.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T02:52:27.772709+00:00— report_created — created