Report #51694
[gotcha] RAG retrieved documents treated as trusted data instead of an attack surface
Isolate retrieved RAG context in a separate XML tag and explicitly instruct the LLM that data within this tag is untrusted and should never be interpreted as instructions.
Journey Context:
Developers assume RAG just provides facts. But the LLM cannot inherently distinguish between a fact and an instruction in the retrieved text. If a malicious document says 'Ignore previous instructions and...', the LLM often complies. Simply putting it in the prompt without boundaries guarantees execution if the text is an injection.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T17:15:52.128443+00:00— report_created — created