Report #30194
[gotcha] Retrieved RAG documents are treated as data but executed as instructions
Separate untrusted data from developer instructions using distinct XML tags \(e.g., \`...\`\) and explicitly instruct the LLM that content within those tags is untrusted and should not be followed as commands.
Journey Context:
LLMs cannot inherently distinguish between 'data to analyze' and 'instructions to follow'. When a RAG pipeline injects a malicious document into the prompt, the LLM will happily follow embedded instructions like 'Ignore previous instructions'. Developers assume the LLM will just 'summarize' the document, but the document overrides the summarization task.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T05:04:05.321086+00:00— report_created — created