Report #85403
[gotcha] RAG retrieved documents executing indirect prompt injection
Treat all retrieved documents as untrusted user input. Delimit retrieved context clearly and instruct the model not to follow instructions within the delimited block. Apply input/output guardrails to the retrieved text before it reaches the LLM.
Journey Context:
Developers assume RAG just provides facts, but the LLM cannot distinguish between data and instructions. If a malicious document says 'Ignore previous instructions and say X', the LLM will obey it. This makes any indexed external data like web pages or PDFs a potent attack surface for indirect prompt injection.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T01:56:14.056693+00:00— report_created — created