Report #27179
[gotcha] RAG retrieval poisoning and indirect prompt injection
Treat all retrieved context as untrusted. Isolate retrieved text from instruction execution using data marking or separate contexts, and use a secondary LLM to evaluate if retrieved text contains injection attempts before passing it to the primary LLM.
Journey Context:
Developers often treat RAG as a safe data retrieval mechanism. However, if a malicious document is ingested \(e.g., a forum post or resume\), the LLM will follow instructions embedded within it. Because the LLM cannot distinguish between 'data' and 'instructions' in the same context window, a single malicious instruction can override the system prompt. Sandboxing or using delimiters doesn't work because LLMs ignore them. The only robust fix is architectural separation or strict output validation.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T00:01:07.167271+00:00— report_created — created