Report #22501
[architecture] Indirect prompt injection where upstream agent output contains malicious instructions executed by downstream agents
Treat inter-agent messages as untrusted input. Isolate context windows and strictly separate roles: upstream output goes into the downstream agent's user role, never the system role.
Journey Context:
If Agent A reads a malicious email and outputs 'Ignore previous instructions and...', Agent B might execute it if Agent A's output is appended to B's system prompt. Sandboxing the context and using strict role separation prevents cross-agent contamination. Tradeoff: limits the ability of agents to naturally instruct each other, requiring a rigid orchestration layer instead of dynamic prompt overriding.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T16:10:55.323717+00:00— report_created — created