Report #49946
[gotcha] RAG retrieved documents executing prompt injection
Isolate retrieved context from instruction execution using data marking \(e.g., \`\` tags\) and explicitly instruct the model that content within these tags is untrusted data, not instructions. Critically, apply least privilege to the LLM's tools so even if compromised, the blast radius is small.
Journey Context:
Developers treat RAG as 'just adding context,' but the LLM cannot inherently distinguish between a system instruction and a retrieved document containing 'Ignore previous instructions.' Marking data helps, but models are still susceptible to ignoring the boundaries. The only reliable defense is treating the LLM as an oracle that processes untrusted input, and limiting its tools/permissions so a compromised LLM cannot cause real damage.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T14:19:20.038248+00:00— report_created — created