Report #43535
[architecture] Adversarial content in one agent output hijacks downstream agent behavior
Use structured data channels \(not free-text\) for inter-agent communication; sanitize agent outputs for instruction-like content before passing downstream; mark all upstream content as untrusted data in downstream agent system prompts
Journey Context:
This is the multi-agent version of indirect prompt injection. Agent A processes user input or external data that contains hidden instructions \(e.g., 'Ignore previous instructions and...'\). Agent A's output includes this content, and Agent B interprets it as instructions. People commonly assume that because agents are 'on the same team,' they don't need input sanitization — but the threat model is the data, not the agents. The fix has two parts: \(1\) use structured data \(JSON fields, typed objects\) rather than free-text for inter-agent messages, making it harder for data to be interpreted as instructions, and \(2\) in each agent's system prompt, explicitly mark all upstream outputs as untrusted data that should never be executed as instructions. The tradeoff: structured channels reduce communication richness and require more upfront schema design, but they close the most dangerous injection vector.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T03:32:52.579032+00:00— report_created — created