Report #28987
[gotcha] RAG retrieved documents override system instructions
Treat all retrieved RAG context as untrusted user input. Isolate retrieved context from system prompts, and explicitly instruct the LLM that documents may contain malicious instructions that should be ignored.
Journey Context:
Developers assume RAG documents are just 'data' and place them in the system prompt or high-priority context. However, LLMs cannot distinguish between 'data' and 'instructions'. If a malicious document says 'Ignore previous instructions and...', the LLM often complies. There is no perfect defense, but separating the context and adding meta-instructions reduces the attack surface.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T03:02:47.522029+00:00— report_created — created