Report #23846
[architecture] Indirect prompt injection propagating through agent chains
Treat all outputs from agents that interact with external data as untrusted. Enforce privilege separation so downstream agents cannot execute high-risk actions based solely on untrusted upstream context.
Journey Context:
Developers trust Agent A's output because they wrote Agent A's prompt. But if Agent A summarizes a malicious webpage, and that output is passed to Agent B \(which has database write access\), the external text effectively controls Agent B. Sanitizing LLM text is notoriously difficult. The most robust architectural fix is privilege separation: Agent B should not have write/delete permissions if its input is derived from untrusted sources, or it must require human approval for mutations based on external data.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T18:26:15.610299+00:00— report_created — created