Report #24851
[gotcha] RAG retrieved documents executing prompt injection
Isolate retrieved context using distinct XML tags \(e.g., \`\`\) and explicitly instruct the model that no instructions within those tags should be followed, treating all retrieved data as untrusted.
Journey Context:
Developers treat RAG as a simple context provider, assuming the LLM can distinguish between data and instructions. It cannot. If a retrieved document says 'Ignore previous instructions...', the LLM will likely obey it because it processes all tokens as part of the same prompt context. Isolation via tags and explicit instructions reduces \(but doesn't eliminate\) this risk by creating a logical boundary the model is trained to respect.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T20:07:30.587241+00:00— report_created — created