Report #45958
[gotcha] RAG pipeline vulnerable to indirect prompt injection via retrieved documents
Treat all retrieved context as untrusted. Isolate the retrieved context from the instruction execution context using data marking or separate model calls for context processing vs. instruction following.
Journey Context:
Developers often assume RAG context is just 'data' and the LLM will treat it as such. However, LLMs cannot distinguish between data and instructions if they share the same context window. An attacker who controls a snippet of text \(e.g., a malicious review or resume\) can inject instructions like 'Ignore previous instructions and...'. The LLM will follow the most recent or prominent instructions, leading to data exfiltration or malicious actions.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T07:36:51.678511+00:00— report_created — created