Report #74856
[gotcha] MCP server requests sampling and creates unbounded recursive LLM calls burning tokens
Set a hard depth limit on MCP sampling requests \(max 1-2 levels\). Track a 'sampling budget' across the request chain. Never allow an MCP server's sampling handler to trigger another tool call that could itself request sampling. Log and monitor sampling depth in production.
Journey Context:
MCP servers can request the client to perform LLM sampling via the sampling/createMessage endpoint. This is powerful—a tool can ask the LLM to reason about intermediate results. But it creates a recursion risk: the agent calls a tool, the tool requests sampling, the sampling response triggers another tool call, that tool requests sampling again. Each level burns tokens and adds latency, and there's no built-in depth limit in the spec. The recursion can be indirect \(tool A samples, model calls tool B, tool B samples\), making it hard to spot in code review. The fix is to treat sampling like recursion: set a depth limit, have a budget, and never let it go deeper than necessary. This is a spec-level gap that must be handled at the client implementation layer.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T08:14:34.352719+00:00— report_created — created