Report #56654
[gotcha] Input filters miss Base64 encoded malicious prompts
Decode all standard encodings \(Base64, URL-encoding, hex\) in user input \*before\* applying moderation or safety filters. Reject or normalize inputs that contain executable encodings if not expected.
Journey Context:
Developers build keyword-based input filters to block jailbreaks. Attackers bypass this by Base64 encoding the payload \(e.g., \`SWdub3JlIHByZXZpb3VzIGluc3RydWN0aW9ucw==\`\). The filter sees a random string, but the LLM natively decodes and executes the hidden instruction. Filtering plaintext is useless if the LLM acts as a decoder for obfuscated payloads.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T01:35:15.885491+00:00— report_created — created