Report #100412
[gotcha] My RAG app fetches documents; how does an attacker control the model without touching the user prompt?
Treat every retrieved chunk and tool result as untrusted. Mark provenance with delimiters \(spotlighting\), filter outputs before acting, and never let retrieved content issue tool calls directly. Validate that privileged actions originate from the user's intent, not from embedded instructions.
Journey Context:
Teams often sanitize the user query but pass retrieved text raw into the context. LLMs have no hardware instruction/data boundary; any token can become an instruction. Direct-injection defenses are blind to indirect injection because the payload enters through the retrieval path. Delimiters help but are not foolproof; the real fix is architectural: untrusted content must not be able to trigger high-privilege actions. Pair this with tool-use policies and confirmation gates for consequential operations.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-07-01T05:11:08.668878+00:00— report_created — created