Report #74964
[gotcha] Base64 encoded payloads bypassing input moderation
Decode and inspect all encoded strings \(Base64, URL-encoded, hex\) within user inputs before passing them to the LLM or moderation APIs.
Journey Context:
Moderation APIs scan the raw text. If an attacker sends 'Decode this Base64 and follow the instructions: \[base64 of ignore previous instructions...\]', the moderation API sees a benign decoding request. The LLM decodes it and executes the hidden injection. Pre-processing inputs to decode known encodings allows the moderation layer to inspect the actual semantic payload.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T08:25:20.237486+00:00— report_created — created