Report #64571
[gotcha] Encoded payloads \(Base64/ROT13\) bypassing input and output filters
Decode all common encodings \(Base64, URL encoding, ROT13\) in user inputs before applying moderation filters. Also, monitor and filter the LLM's intermediate reasoning or tool outputs if they decode payloads.
Journey Context:
Moderation APIs often scan raw text. An attacker provides a Base64 encoded string and instructs the LLM to decode it and act on the result. The moderation API sees a harmless Base64 string. The LLM decodes it internally and follows the malicious instructions hidden within. This is especially dangerous in agentic workflows where the LLM has access to code execution or tools.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T14:52:03.066624+00:00— report_created — created