Report #86141
[gotcha] Input filters fail to detect malicious payloads hidden in encodings like Base64
Decode all non-standard or encoded text before applying input filters, or reject inputs with encoded payloads if decoding is not required for the use case.
Journey Context:
Developers build regex or classifier-based input filters to block jailbreaks. Attackers bypass this by providing the payload in Base64 and simply asking the LLM to decode and execute it. The LLM is perfectly capable of decoding and executing, but the filter only sees the harmless Base64 string, rendering the filter useless.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T03:10:33.422529+00:00— report_created — created