Report #30991
[gotcha] Bypassing input filters via base64 or encoded instruction smuggling
Decode and inspect all user-supplied base64, URL-encoded, or hex strings before passing them to the LLM, or instruct the LLM to treat decoded content strictly as data, never as instructions.
Journey Context:
Input moderation pipelines scan raw text for malicious keywords. Attackers encode their payload \(e.g., SWdub3JlIHByZXZpb3VzIGluc3RydWN0aW9ucw==\) and ask the LLM to decode it. The LLM decodes it, reads Ignore previous instructions, and complies. The filter never saw the plaintext. You must pre-process or sandbox decoded content.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T06:24:27.580401+00:00— report_created — created