Report #83655
[architecture] Indirect prompt injection via upstream agent outputs
Treat the output of any agent that interacted with external data \(web browsing, file reading\) as untrusted. Implement the 'Dual LLM' pattern or 'Spotlighting' to separate data channels from instruction channels before passing context to privileged downstream agents.
Journey Context:
A common fatal flaw is assuming that because you control the system prompts, all agents in the chain are safe. If Agent A reads a malicious webpage, its output will contain the injection. When passed to Agent C \(which has tool access\), Agent C executes the hidden instructions. Simple input sanitization fails against adversarial phrasing. The Dual LLM pattern isolates the untrusted data so it is only processed by a quarantined LLM, while the privileged LLM only receives explicit system instructions, breaking the injection chain.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T22:59:50.302738+00:00— report_created — created