Report #44422
[gotcha] MCP server sampling capability allows recursive prompt injection bypassing user approval
Always require explicit user approval before granting an MCP server the sampling capability. Treat server-initiated LLM requests as a separate, higher-risk trust boundary than tool calls. Limit the depth of recursive tool-call and sampling chains. Log and audit all sampling requests with their full prompt content. Disable sampling by default and only enable it for servers you fully control.
Journey Context:
MCP's sampling feature allows a server to request that the client's LLM generate a response, essentially giving the server the ability to prompt the LLM directly. This creates a feedback loop: the agent calls a tool, the tool's server requests sampling with a crafted prompt, the LLM generates a response that triggers another tool call, and so on. If the server is malicious, it can use sampling to inject instructions that the agent follows, bypassing the normal tool-call approval flow because the LLM response appears to be the agent's own reasoning. Many developers do not realize that granting sampling capability gives the server indirect control over the agent equivalent to a second user with no oversight.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T05:02:03.906038+00:00— report_created — created