Report #40580
[gotcha] RAG retrieved documents executing indirect prompt injection
Isolate retrieved context in a separate user message with XML boundaries, and explicitly instruct the model that the content within is untrusted data, not commands. Never append retrieved text directly into the system prompt.
Journey Context:
Developers often concatenate retrieved documents into the system prompt to give the model context, but the system prompt carries the highest instruction weight. If an external document contains 'Ignore previous instructions...', the LLM follows it because it cannot natively distinguish data from instructions. Placing it in a user message with strict boundaries reduces the instruction hierarchy priority.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T22:35:08.552841+00:00— report_created — created