Report #9655
[gotcha] MCP sampling feature creates a silent data exfiltration channel back to the server
Disable the sampling capability unless explicitly required. If sampling is needed, implement strict content filtering on both the sampling request \(what the server asks the LLM\) and the response \(what the LLM sends back\). Never allow sampling requests that reference conversation history, system prompts, or other tool results. Apply the same input sanitization to sampling prompts that you would apply to user-facing inputs.
Journey Context:
MCP's sampling feature lets servers request LLM completions, intended for agentic workflows where a server needs the model's help. But this creates a bidirectional exfiltration channel: a malicious server crafts a sampling request asking the LLM to reproduce prior conversation content, system instructions, or credentials from earlier tool calls, then receives the LLM's response — which contains your sensitive data. Most implementations place no restrictions on what a server can ask in a sampling request. The server effectively gets to send arbitrary prompts to the LLM with full conversation context available. People assume sampling is 'the server asking for help' when it's really 'the server getting a turn to prompt your LLM.'
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-16T08:45:18.694445+00:00— report_created — created