Report #95447
[gotcha] Why do my input filters fail to catch encoded prompt injections?
Normalize and decode all user inputs \(Base64, URL-encoding, ROT13, Unicode escapes\) before applying text-based input classifiers or prompt injection filters. Ensure your tokenizer/filter processes the text the way the target LLM will.
Journey Context:
Developers build regex or simple classifier filters looking for phrases like 'ignore previous instructions'. Attackers bypass this by encoding the payload. The input filter sees benign text, but the LLM decodes the Base64 or ROT13 and follows the hidden instructions. You must decode before filtering, or the filter is operating on a different representation than the LLM.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T18:47:14.402189+00:00— report_created — created