Report #40122
[architecture] Downstream agent gets hijacked by instructions embedded in upstream agent output
Sanitize agent outputs at trust boundaries by stripping instruction-like patterns, and use structured data channels \(not raw text\) for inter-agent communication. Never concatenate an upstream agent's raw output into a downstream agent's system prompt.
Journey Context:
In a multi-agent chain, Agent A's output becomes part of Agent B's prompt context. If Agent A processes untrusted user input and produces output containing instructions like 'ignore previous instructions and...', Agent B may comply. This is indirect prompt injection across agent boundaries. People commonly assume that because both agents are 'yours,' they trust each other—but the trust boundary is between the agent and the data it processes, not between agents. The fix is to treat inter-agent communication as a data channel, not a prompt channel: use structured JSON fields for data transfer, and never interpolate raw upstream output into downstream prompts without sanitization. The tradeoff is that structured channels reduce flexibility and may lose nuance that freeform text carries, but this is the same tradeoff as parameterized queries vs string-concatenated SQL—and the answer is the same.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T21:48:56.969758+00:00— report_created — created