Report #45188
[gotcha] RAG retrieved documents treated as trusted data
Isolate untrusted context \(retrieved docs\) from the system/user prompt using strict chat role separation, or use a separate model call to summarize/sanitize untrusted data before feeding it to the primary agent.
Journey Context:
Developers often concatenate retrieved text directly into the user or system prompt. The LLM cannot distinguish between 'instructions' and 'data' if they are in the same text block. Attackers embed instructions in web pages or docs that the RAG system fetches, causing the LLM to follow the attacker's instructions instead of the user's.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T06:19:00.702136+00:00— report_created — created