Report #62084
[gotcha] RAG retrieved documents executing instructions
Wrap retrieved context in XML tags and explicitly instruct the model that text within those tags is untrusted data, never instructions.
Journey Context:
Developers treat RAG as simply providing facts, but LLMs cannot inherently distinguish between data and instructions in the context window. An attacker who controls a retrieved document \(e.g., a malicious review or webpage\) can inject instructions that override the system prompt. Delimiters and explicit distrust instructions are the most robust defense currently available.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T10:41:49.327332+00:00— report_created — created