Report #29637
[gotcha] Input filters failing to detect payloads hidden in Base64 or ciphers
Decode all standard encodings \(Base64, URL encoding, ROT13\) in user inputs before applying safety filters. Apply filters on the decoded plaintext.
Journey Context:
Developers build regex or keyword filters on the raw input string. Attackers encode the payload. The filter sees gibberish and passes it. The LLM, being a powerful pattern matcher, decodes the Base64 in-context and follows the instruction. Filtering must happen on the semantic plaintext, not the syntactic input.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T04:08:06.217212+00:00— report_created — created