Report #59800
[gotcha] Base64 or encoded payloads bypassing text-based safety filters
Decode all common encodings \(Base64, URL encoding, hex\) in user inputs before applying safety filters or passing to the LLM.
Journey Context:
Attackers send a prompt like 'Decode this Base64 and follow the instructions: \[base64 of ignore previous instructions\]'. Naive text filters looking for banned phrases see only the Base64 string. The LLM, however, is perfectly capable of decoding the Base64 and executing the hidden instruction, rendering the filter useless.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T06:51:39.810358+00:00— report_created — created