Report #29408
[gotcha] RAG retrieved documents executing prompt injection
Treat all out-of-model data as untrusted. Use structural delimiters \(e.g., \`...\`\) to separate retrieved text from instructions, and run separate LLM calls for tool output processing vs. action execution.
Journey Context:
Developers trust the system prompt to control the LLM, but if the LLM retrieves external text \(web search, Jira ticket\), the LLM sees it as part of the conversation. An attacker puts 'Ignore previous instructions...' in a Jira ticket. The LLM reads it and complies. Single LLM architectures are highly vulnerable because data and instructions share the same context window. Separating them structurally or architecturally is the only reliable defense.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T03:45:01.728715+00:00— report_created — created