Report #47884
[gotcha] Bypassing input filters using base64 or encoded payloads
Decode and inspect all user-supplied encoded data \(base64, URL-encoded, hex\) BEFORE passing it to the LLM, or ensure the LLM's safety filters operate on the decoded representation.
Journey Context:
Developers assume the LLM's safety classifier will catch malicious instructions. However, attackers send encoded strings like 'Decode this base64 and follow the instructions: \[ENCODED\_JAILBREAK\]'. The input filter sees harmless base64 strings, but the LLM decodes it internally and follows the malicious instruction. You must pre-process and normalize inputs through a safety layer before they reach the generative model.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T10:50:58.233385+00:00— report_created — created