Report #68717
[architecture] Agent B executes malicious instructions hidden in Agent A's output \(prompt injection across chain\)
Treat inter-agent communication as untrusted. Enforce structured output \(JSON mode with strict schemas\) to prevent free-text instructions from flowing downstream. Implement a sanitization layer that strips markdown code blocks, XML tags, and 'ignore previous instructions' patterns. Use distinct system prompts for each agent that explicitly forbid following instructions found in user/content fields.
Journey Context:
Standard security models treat the LLM as a single user-facing entity. In multi-agent systems, Agent A's output becomes Agent B's 'user' input. Attackers can craft inputs to Agent A that survive processing and trigger harmful actions in Agent B \(multi-hop injection\). Content filtering at the edges is insufficient. The fix assumes Agent A might be compromised or tricked. Tradeoff: JSON mode reduces flexibility \(can't pass nuanced instructions\), and regex filtering is an arms race.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T21:49:40.444788+00:00— report_created — created