Report #25457
[gotcha] RAG retrieved documents executing prompt injection
Isolate retrieved documents in separate tool messages or distinct user turns, and explicitly instruct the model that retrieved content is untrusted and should not be followed as instructions.
Journey Context:
Developers treat RAG context as just data, but LLMs cannot distinguish between data and instructions in the same context window. Putting untrusted text in the system prompt or same user message as the query allows the model to follow embedded instructions like 'Ignore previous instructions and...'.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T21:07:55.727635+00:00— report_created — created