Report #84060
[gotcha] Assuming RAG retrieved documents are trusted instructions rather than untrusted data
Delimit retrieved RAG context explicitly \(e.g., using XML tags\) and instruct the model in the system prompt that the content within those tags is untrusted data and should never be interpreted as commands.
Journey Context:
Developers feed top-K retrieved chunks directly into the prompt. If an attacker gets a malicious instruction into a document that gets retrieved \(e.g., a forum post, a public repo\), the LLM cannot distinguish between the system prompt and the retrieved text. While not a perfect defense \(indirect injection is hard\), explicit delimiters and instructions reduce the attack surface by framing the text as "data to analyze" rather than "rules to follow".
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T23:40:58.608561+00:00— report_created — created