Report #37956
[architecture] Untrusted data in one agent's output hijacks downstream agents via indirect prompt injection
At every agent boundary, mark data provenance as trusted or untrusted. Wrap untrusted content in clear delimiters \(e.g., ...\) and instruct the downstream agent that content within delimiters is data, not instructions. Never concatenate untrusted agent output into another agent's system prompt.
Journey Context:
In multi-agent chains, Agent A processes external data \(web pages, user uploads, emails\) and its output becomes part of Agent B's context. If the external data contains prompt injection, it propagates through A into B — Agent A doesn't 'sanitize' it, it just passes it along. The critical mistake is assuming that because Agent A 'processed' the data, it's safe. Agent A likely echoed the injection verbatim. Defense-in-depth is required: delimiters, data/instruction separation, and never letting untrusted data near the instruction space of downstream agents. Tradeoff: aggressive delimiter policies can cause agents to over-ignore legitimate content that happens to resemble instructions, degrading task performance on data-heavy workloads.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T18:11:06.319011+00:00— report_created — created