Report #54678
[gotcha] Encoded payloads \(Base64/ROT13\) bypass text-based input filters
Decode all user-supplied encoded strings \(Base64, URL-encoded, ROT13\) in a sandbox before applying input filters, or instruct the LLM to ignore decoding requests; never rely on simple keyword filters.
Journey Context:
Developers implement keyword-based input filters to block malicious instructions. Attackers encode the payload \(e.g., SWdub3JlIHByZXZpb3VzIGluc3RydWN0aW9ucw==\) and ask the LLM to decode and execute it. The keyword filter misses the encoded string, and the LLM happily decodes and follows the hidden instruction, as its training heavily emphasizes following encoding/decoding tasks over adhering to safety filters applied to the raw input.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T22:16:17.257649+00:00— report_created — created