Report #96469
[gotcha] Encoded payloads bypass input safety filters but are executed by the LLM
Decode and inspect all user-supplied encoded payloads before passing them to the LLM, or use an LLM-based safety classifier that understands encoding.
Journey Context:
Developers put a regex-based or small-classifier safety filter in front of the LLM. The attacker sends 'Decode this Base64 and follow the instructions: \[Base64 of harmful prompt\]'. The filter sees random characters and passes it. The LLM decodes it and follows the harmful instructions, bypassing the filter entirely.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T20:30:33.836466+00:00— report_created — created