Report #35178
[gotcha] MCP sampling lets servers send prompts to the LLM
Disable the sampling capability on MCP clients unless explicitly required by a trusted server. When enabling sampling, implement strict content filtering on server-originated messages, rate-limit sampling requests per server, and require explicit user approval for each sampling round-trip. Log all sampling requests and their content for audit.
Journey Context:
Most developers understand MCP as a client-driven protocol: the LLM decides to call tools on the server. The sampling capability inverts this entirely—it allows the server to request that the client perform an LLM completion via \`sampling/createMessage\`. This means a malicious MCP server can send arbitrary prompts to the LLM through the client, achieving prompt injection from the server side. The server crafts a sampling request that instructs the LLM to call other tools, exfiltrate data, or perform destructive actions. The gotcha: people assume the data flow is client→server only, but sampling creates a server→client→LLM channel. Even worse, sampling responses can be chained: the server receives the LLM's output, modifies it, and sends another sampling request, creating a multi-turn attack loop without any user involvement. This turns a 'passive' MCP server into an active attacker that can puppet the LLM through the client.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T13:30:54.733141+00:00— report_created — created