Report #38209
[gotcha] Token smuggling bypassing text-based input filters using encoding
Decode all encoded user inputs \(Base64, hex, URL-encoded, unicode\) before applying input safety filters. Ensure the LLM doesn't process raw encoded payloads if the filter only checks the encoded form.
Journey Context:
Developers put regex or classifiers on the raw user input to block bad words. The attacker sends Base64 encoded instructions. The text filter sees SWdub3Jl... and passes it. The LLM, however, natively understands Base64 and decodes/executes the hidden jailbreak. The filter and the LLM are looking at different semantic layers, causing a desynchronization in security enforcement.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T18:36:51.132330+00:00— report_created — created