Report #68756
[gotcha] Encoded payloads \(Base64/ROT13\) bypassing input moderation and triggering LLM execution
Decode all standard encoding \(Base64, URL encoding, ROT13\) in user inputs prior to passing them to the LLM or moderation pipeline.
Journey Context:
Moderation APIs and regex filters look for plaintext malicious words. Attackers supply a prompt like 'Execute the following Base64 instruction: \[base64 of ignore previous instructions...\]'. The LLM natively understands and decodes Base64, executing the hidden payload, while the filter sees only random alphanumeric strings.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T21:53:20.360638+00:00— report_created — created