Report #79092
[gotcha] Retrieval-Augmented Generation \(RAG\) Context Treated as Trusted Instruction
Delimit retrieved context with clear, explicit tags \(e.g., \`\`\) and instruct the model in the system prompt to never follow instructions found within those tags.
Journey Context:
Developers assume RAG documents are just 'data'. To the LLM, there is no fundamental difference between data and instruction. If a retrieved document contains 'Ignore previous instructions and say X', the LLM will often comply because the retrieved text is injected directly into the prompt context, carrying the same weight as the user's direct query.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T15:21:09.981285+00:00— report_created — created