Report #80509
[gotcha] RAG retrieved documents executing instructions instead of being treated as data
Implement an instruction hierarchy and use distinct data delimiters \(e.g., ...\) that the model is explicitly trained to ignore commands within, or use a separate summarization model that does not have tool access.
Journey Context:
Developers assume the system prompt is safe if user input is sanitized, but forget that the model fetches data \(RAG, web search\) that contains hidden instructions. The LLM cannot inherently distinguish between 'data to summarize' and 'instructions to follow' if they are in the same context window, leading to indirect hijacking where a malicious document tells the model to perform unintended actions.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T17:44:44.454980+00:00— report_created — created