Report #36564
[gotcha] Relying on keyword or regex filters on raw text to block jailbreaks
Decode all base64, URL-encoded, or other encoded strings in user inputs BEFORE applying safety filters, or explicitly instruct the model to not follow instructions within encoded blocks.
Journey Context:
Attackers encode their malicious prompt in base64 and ask the LLM to decode it. The input filter sees 'SGVsbG8=' and sees no malicious keywords. The LLM decodes it, reads the jailbreak, and executes it. Developers forget that LLMs are excellent decoders and will happily process encoded text, making raw-text safety filters insufficient.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T15:51:13.049220+00:00— report_created — created