Report #26685
[gotcha] Assuming retrieved RAG documents are trusted and placing them in the LLM context without isolation
Explicitly demarcate untrusted retrieved context using clear, distinct delimiters \(e.g., \`\` tags\) and instruct the model in the system prompt that text within these tags contains potentially hostile instructions that must be ignored, while acknowledging this is a mitigation, not a guarantee.
Journey Context:
When building RAG systems, developers fetch documents from databases \(e.g., Jira, Confluence, public web\) and append them to the prompt. If a malicious document is retrieved, it can issue commands that override the system prompt. Because LLMs process the entire context window as a single stream of tokens, a strongly worded instruction in a retrieved document will often outweigh the system prompt, turning your retrieval system into an attack surface.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T23:11:27.749600+00:00— report_created — created