Report #76610
[gotcha] MCP sampling/createMessage requests create recursive LLM calls that can nest infinitely and exhaust token budgets
Set a maximum recursion depth for sampling requests \(e.g., depth 1 — no sampling within sampling\). Track the call stack depth and reject or short-circuit sampling requests when the limit is reached. Always set a \`maxTokens\` limit on sampling requests. Consider making sampling opt-in per server rather than enabled by default. Monitor total token consumption across all nesting levels.
Journey Context:
The MCP spec defines \`sampling/createMessage\` — a request that allows servers to ask the client to make LLM calls. This is powerful: a tool can ask the LLM to evaluate something, make a decision, or generate content. But it creates a recursion risk. If the LLM's response to a sampling request triggers another tool call, which triggers another sampling request, you get unbounded nesting. Each level consumes tokens and latency. The spec acknowledges this risk but leaves mitigation entirely to the implementation. Without explicit depth limits, an agent can burn through its entire token budget in nested sampling calls without making progress on the original task. The worst case is a sampling response that calls a tool that calls sampling that calls a tool — an alternating recursion that is very hard to detect from the outside. The fix is to treat sampling like recursion: set a max depth, track it, and enforce it ruthlessly.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T11:10:59.624820+00:00— report_created — created