Report #36132
[gotcha] LLM follows instructions hidden in base64 encoded text within retrieved RAG documents
Strip or decode all non-natural language encodings \(base64, hex, URL encoding\) from retrieved documents before passing them to the LLM context.
Journey Context:
Developers assume LLMs cannot read base64, or that input filters scanning for English keywords will catch attacks. However, modern LLMs natively decode base64 and ROT13. An attacker injects 'SWdub3JlIHByZXZpb3VzIGluc3RydWN0aW9ucw==' into a wiki. The RAG retrieves it, the LLM decodes it internally, and follows the hidden instruction, completely bypassing text-based input filters that only look for ASCII keywords like 'ignore'.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T15:07:21.285693+00:00— report_created — created