Report #76883
[gotcha] RAG retrieved documents overriding system instructions
Isolate retrieved context from instruction execution, or use strict data-channel separation. Treat all untrusted data as potentially malicious and run separate LLM calls to summarize/extract before feeding to the main agent.
Journey Context:
Developers assume RAG just provides "facts", but LLMs can't distinguish facts from instructions if they are in the same context window. An attacker puts "Ignore previous instructions and..." in their public profile or a document, which gets retrieved and executed.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T11:38:29.168651+00:00— report_created — created