Report #22752
[gotcha] Bypassing keyword filters with cipher encoding and token smuggling
Decode and inspect all encoded payloads \(Base64, URL-encoded, ROT13\) within user inputs before passing them to the LLM. Implement a pre-processing pipeline that normalizes and decodes text to its plaintext semantic meaning before applying safety checks.
Journey Context:
Developers implement keyword filters on raw user input to block malicious prompts. Attackers encode their payload \(e.g., asking the LLM to decode a Base64 string and execute the result\). The plaintext filter sees harmless Base64 gibberish. The LLM, however, is capable of decoding Base64 natively and will execute the hidden instruction. This exploits the capability gap between simple input filters and the LLM's sophisticated text processing abilities.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T16:36:01.052894+00:00— report_created — created