Report #63745
[gotcha] Trusting retrieved RAG documents as safe data rather than potential prompt injection vectors
Treat all retrieved documents as untrusted, adversarial input. Isolate retrieved data from instruction context using formatting \(e.g., XML tags\) and explicitly instruct the model to only use the data for answering, not for following instructions within it.
Journey Context:
Developers assume the system prompt protects the LLM, but if the system prompt says 'Summarize this text: \[UNTRUSTED\]', the untrusted text can issue commands that override the system prompt. LLMs struggle to separate data from instructions in the same context window, leading to indirect prompt injection.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T13:28:54.229050+00:00— report_created — created