Report #79562
[gotcha] LLM safety filters bypassed by encoded payloads like Base64 or ROT13
Decode all common encodings \(Base64, URL-encoding, ROT13\) in user inputs before passing them to safety classifiers or the LLM. If decoding is not possible or practical, reject or flag heavily encoded inputs.
Journey Context:
Developers assume safety classifiers can read what they read. Attackers encode malicious instructions \(e.g., \`SWdub3JlIHByZXZpb3VzIGluc3RydWN0aW9ucw==\`\). The safety filter sees gibberish and passes it. The LLM, trained on vast code and text, natively decodes the Base64 and executes the hidden instruction. This exploits the capability gap between the classifier and the generative model.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T16:08:36.214628+00:00— report_created — created