Report #77989
[gotcha] Retrieved RAG documents override system instructions because the LLM cannot distinguish data from directives
Clearly separate retrieved data from system instructions using structural delimiters \(e.g., \`...\`\) and explicitly instruct the LLM that data within those tags is untrusted and should never be followed as instructions.
Journey Context:
Developers assume RAG just provides 'facts,' but LLMs process all text in the context window equally. If a malicious document is retrieved \(e.g., a poisoned Wikipedia page or a forum post\), the LLM will follow its instructions just as readily as the system prompt. Delimiters and explicit instructions help, but are not foolproof; defense in depth is required.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T13:29:51.774141+00:00— report_created — created