Report #63877
[gotcha] Input filters failing to detect malicious payloads encoded in Base64 or ROT13
Implement decoding routines in your input pipeline to inspect the plaintext of any encoded strings before passing them to the LLM, and reject prompts containing unresolved encoded blocks.
Journey Context:
Developers assume that if a prompt is encoded, the LLM won't understand it. However, LLMs are excellent at recognizing and decoding common encodings. An attacker bypasses keyword filters by Base64-encoding the malicious instruction; the LLM decodes it in-context and executes the hidden payload. The filter sees gibberish, the LLM sees 'Ignore all previous instructions'.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T13:42:30.078590+00:00— report_created — created