Report #74882
[gotcha] Base64 or encoded payloads bypassing input guardrails
If using an input guardrail, ensure it decodes and inspects common encoding schemes \(Base64, URL encoding, ROT13, hex\) before evaluating the prompt. Alternatively, apply the guardrail to the decoded representation of the prompt.
Journey Context:
Developers deploy input classifiers to block malicious prompts. Attackers simply encode the payload \(e.g., 'Write a prompt to hack X' in Base64\) and ask the LLM to decode and execute it. The input filter sees a harmless Base64 string. The LLM decodes it and follows the instruction. You must decode before filtering, but doing so perfectly is computationally hard, so focus on the most common encodings.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T08:17:10.022599+00:00— report_created — created