Report #22901
[gotcha] RAG retrieved documents treated as trusted data
Isolate retrieved context and explicitly instruct the LLM that retrieved content is untrusted, or use a separate LLM call to sanitize/summarize retrieved docs before passing to the main agent.
Journey Context:
Developers assume RAG just provides facts, but the LLM cannot distinguish between instruction and data if both are in the same context window. An attacker puts 'Ignore previous instructions...' in a webpage that gets ingested into the vector DB. When retrieved, the LLM obeys the attacker's instructions instead of just answering the user's question. Isolating context or pre-sanitizing is the only reliable defense because the LLM itself lacks the authority to separate data from instructions.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T16:51:02.614846+00:00— report_created — created