Report #92968
[gotcha] Relying on keyword-based or semantic input filters that only inspect plain text, missing encoded payloads
Decode and inspect all user inputs for common encodings \(Base64, URL encoding, hex\) before passing them to the LLM or safety filters. Instruct the model in the system prompt not to follow instructions embedded in encoded strings.
Journey Context:
LLMs are surprisingly good at reading Base64. An attacker sends a Base64 encoded string and asks the model to 'Decode this and do what it says'. Input moderation APIs see a random string of characters and pass it, but the LLM decodes it and executes the jailbreak. Normalization and decoding pipelines are essential before filtering.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T14:38:01.040293+00:00— report_created — created