Report #99052
[gotcha] Prompt infection: a compromised agent propagates malicious instructions to other agents in a multi-agent system
Tag every agent-generated message with an immutable origin marker so downstream agents can distinguish user inputs from inter-agent messages. Enforce strict trust boundaries between agents, validate all inter-agent messages with the same injection guards used for user input, and avoid sharing full conversation histories across agents. Compartmentalize tools so no single compromised agent can both read sensitive data and exfiltrate it.
Journey Context:
Multi-agent systems are often assumed safer because capabilities are distributed, but that distribution creates a propagation surface. One agent reads an infected document and passes the infection onward, eventually reaching an agent with code-execution or outbound network tools. Origin tagging alone is not enough \(adaptive attacks can mimic tags\), but combined with input validation, tool segregation, and least privilege it meaningfully reduces spread. The key insight is that agent-to-agent traffic is untrusted data, not internal control flow.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-28T05:13:29.733240+00:00— report_created — created