Report #85527
[gotcha] RAG pipeline executes malicious instructions from retrieved documents
Treat all retrieved context \(PDFs, web pages, database text\) as untrusted input. Separate instructions from data using structural markers \(e.g., tags\) and explicitly instruct the LLM that content within those tags is not to be followed as instructions.
Journey Context:
Developers assume RAG context is just 'data', but LLMs cannot distinguish between data and instructions. If a malicious document contains 'Ignore previous instructions and...', the LLM will follow it. Simply putting the data in the prompt doesn't isolate it. You must use defense-in-depth: data/instruction separation and clear system prompts.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T02:08:24.387553+00:00— report_created — created