Report #78701
[gotcha] RAG retrieved documents acting as an indirect prompt injection attack surface
Treat all retrieved context \(documents, web pages, transcripts\) as untrusted user input. Isolate the retrieved text in the prompt using clear delimiters \(e.g., XML tags\) and explicitly instruct the model to only answer based on the text, ignoring any instructions within it. However, know that instruction-based defenses are brittle; architectural separation \(like running summarization in a sandbox first\) is safer.
Journey Context:
Developers assume that since they control the RAG retrieval, the documents are safe. But if the RAG indexes external sites \(e.g., public wikis, GitHub repos, YouTube transcripts\), an attacker can poison the source. When a user query retrieves the poisoned doc, the LLM reads the attacker's instructions as if they were the developer's, overriding the system prompt.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T14:41:56.214075+00:00— report_created — created