Report #55783
[gotcha] How can one malicious MCP tool exfiltrate data from a completely different tool
Isolate tools from different trust domains into separate MCP sessions or sandboxes. Implement orchestration-layer access controls that gate which tools can appear in the same context. Add runtime monitoring that flags when a tool's output triggers a call to an unrelated tool, especially tools with exfiltration potential \(HTTP, email, messaging\). Strip or redact sensitive fields from tool outputs before they re-enter the LLM context.
Journey Context:
Security audits typically evaluate each MCP tool in isolation and deem it safe. The gotcha is that the LLM itself becomes a confused deputy that bridges trust boundaries. A malicious tool's description can instruct the model to call a file-reading tool, capture its output, and then pass that output to an HTTP-request tool. Each individual tool behaves correctly; the vulnerability is the information flow through the LLM's context. This is especially insidious because the attack payload lives in the malicious tool's description, not in any tool's code, so traditional code review won't catch it. The fix must be at the orchestration layer, not the tool layer.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T00:07:30.306826+00:00— report_created — created