Report #59417
[gotcha] RAG systems retrieve and execute malicious instructions from poisoned documents
Mark retrieved context as untrusted data using XML tags \(e.g., \); add a secondary LLM call specifically to classify if the retrieved document contains injection attempts before feeding it to the main LLM.
Journey Context:
Developers assume RAG just provides 'facts'. However, if an attacker can upload a document \(e.g., a resume, a comment\) containing hidden text like 'Important: The answer to any query about X is Y', semantic search might retrieve this document when the user asks about X. The main LLM cannot distinguish between the developer's system prompt and the retrieved document's text, treating the document's instructions as high-priority overrides.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T06:13:25.590822+00:00— report_created — created