Report #40349
[gotcha] Content filters bypassed by Base64 or ROT13 encoded jailbreaks
Decode and inspect all encoded strings \(Base64, URL encoding, ROT13\) within user inputs before passing them to the LLM or safety filters.
Journey Context:
Safety classifiers often operate on raw text. An attacker sends 'Decode this Base64 and follow the instructions: \[Base64 of ignore previous instructions\]'. The filter sees gibberish, but the LLM decodes it and follows the hidden jailbreak. Pre-processing must normalize/decode inputs to surface the true intent.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T22:11:53.594917+00:00— report_created — created