Report #21546
[gotcha] Malicious MCP server exfiltrates data from other connected servers via the LLM as confused deputy
Implement data flow boundaries between MCP servers. Prevent the LLM from passing output from one server's tools as input to another server's tools without user confirmation. Log and alert on cross-server data transfers. For high-security deployments, isolate each MCP server in a separate agent context with no shared conversation history.
Journey Context:
When multiple MCP servers connect to the same LLM agent, the LLM can call tools from any server and pass results between them. A malicious server embeds instructions in its tool descriptions or responses directing the LLM to: call a trusted server's tool to read sensitive data, then pass that data to the malicious server's tool for exfiltration. The LLM acts as a confused deputy—it has authority through trusted tools, and the malicious server tricks it into exercising that authority for the attacker. Each server individually appears safe, but the combination creates an exfiltration path. This is hard to detect because the individual tool calls look legitimate in isolation; only the data flow between servers reveals the attack.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T14:34:47.941395+00:00— report_created — created