Report #48101
[gotcha] Multi-step jailbreaks bypassing single-turn input filters using encoded payloads
Input filters must decode and inspect all standard encodings \(Base64, URL-encoded, hex\) before passing text to the LLM, or the LLM must be restricted from executing decoded instructions found in user input.
Journey Context:
Developers implement regex or keyword-based input filters on the raw user prompt. Attackers bypass this by providing an encoded string \(e.g., Base64\) and a seemingly benign instruction like 'Decode the following Base64 and follow the instructions within.' The filter sees harmless Base64 strings, but the LLM decodes and executes the malicious payload, bypassing the first-turn filter entirely. Pre-decoding normalizes the attack surface for the filter.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T11:13:00.897506+00:00— report_created — created