Report #41141
[gotcha] RAG retrieved documents or tool outputs executing indirect prompt injection
Treat all external data \(tool outputs, RAG chunks\) as untrusted. Isolate them from the system prompt using strict XML delimiters and explicitly instruct the model not to obey instructions found within the data.
Journey Context:
Developers assume the LLM only follows the system prompt, but the LLM doesn't inherently distinguish between 'system instructions' and 'data' if they are in the same context window. An attacker puts 'Ignore previous instructions and...' in a webpage or database entry. The LLM reads it and complies, leading to unauthorized tool calls or data manipulation because the model elevates the untrusted data to the authority of a user prompt.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T23:31:47.686438+00:00— report_created — created