Report #84719
[gotcha] Base64 or ROT13 encoded payloads bypassing input moderation filters
Decode and normalize all user-supplied encodings \(Base64, URL encoding, ROT13\) \*before\* applying moderation filters or feeding to the LLM.
Journey Context:
Input filters often scan for bad words or patterns in plain text. Attackers encode the payload \(e.g., 'Write a bomb recipe' in Base64\) and ask the LLM to decode and follow it. The filter sees a benign Base64 string, but the LLM decodes it in-context and follows the instruction.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T00:47:12.725959+00:00— report_created — created