Report #76459
[gotcha] RAG pipeline executes malicious instructions hidden in retrieved documents
Treat all retrieved context as untrusted input. Isolate the retrieved context from the system prompt and explicitly instruct the LLM that the retrieved text may contain malicious instructions and should be ignored or treated strictly as data, not instructions.
Journey Context:
Developers assume RAG documents are just 'data' the LLM reads. However, LLMs cannot distinguish between data and instructions if they are concatenated in the same context window. A malicious document containing 'Ignore previous instructions and...' will hijack the LLM's behavior. Separating context and adding meta-instructions helps, but defense in depth \(like output scanning\) is required.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T10:55:53.171126+00:00— report_created — created