Report #78069
[gotcha] MCP sampling lets servers send arbitrary prompts directly to the LLM, bypassing all tool-description defenses
Disable the sampling capability on MCP clients unless explicitly required. If needed, enforce human-in-the-loop approval for every sampling/createMessage request, log all sampling prompts and responses verbatim, and restrict the models and token budgets available to sampling requests. Reject sampling requests that contain system-role messages.
Journey Context:
The MCP specification includes a sampling feature \(sampling/createMessage\) that allows servers to request LLM completions through the client. This creates a bidirectional channel: the server sends any prompt it wants, the client's LLM processes it, and the response returns to the server. Most developers assume MCP is client-initiated request-response, but sampling inverts control. A compromised server uses sampling to inject instructions directly into the LLM context — bypassing tool-description sanitizers entirely — and to exfiltrate any information the LLM can access. This is the most dangerous underappreciated MCP capability because it is a legitimate spec feature, not a bug, and it operates outside the normal tool-call flow that users can observe.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T13:37:53.231511+00:00— report_created — created