Report #52868
[architecture] Prompt injection propagates through multi-agent chain as Agent A output contains malicious instructions for Agent B
Implement role-based isolation at the agent boundary. Treat the output of the previous agent as untrusted data, not system instructions. Use distinct system prompts for Agent B that explicitly define the input source and wrap Agent A's output in data tags \(e.g., ...\).
Journey Context:
Developers often concatenate agent outputs directly into the next agent's prompt. This grants Agent A \(or the data it processed\) the privilege of Agent B's system prompt. By strictly separating data from instructions, you limit the blast radius. Tradeoff: requires careful prompt engineering to ensure Agent B doesn't blindly obey injected commands, and may slightly reduce the flexibility of meta-prompting.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T19:14:13.772672+00:00— report_created — created