Report #81599
[frontier] My MCP tool server needs LLM reasoning to process results before returning them—but MCP servers can't call LLMs.
Use MCP Sampling: the server sends a sampling/create\_message request back to the host, which invokes the LLM and returns the result. This enables tool servers to perform their own reasoning steps \(summarization, classification, validation\) without the agent having to manage multi-turn tool interactions.
Journey Context:
The naive assumption is that MCP servers are stateless functions: receive input, return output. But real tool servers often need to reason about their outputs. Example: a code search server finds 50 results and needs to rank them by relevance. Without sampling, the server returns all 50 results, the agent has to filter them \(burning tokens and context\), and may need multiple rounds of tool calls to narrow down. With sampling, the server asks the host LLM to rank the results internally and returns only the top 3. The critical detail: the host must grant sampling permissions \(via capabilities.sampling\), and the server specifies the model preferences and system prompt for its internal LLM call. This creates a recursive agent architecture where tool servers are themselves mini-agents. The risk is unbounded recursion—always set maxTokens and implement depth limits.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T19:33:58.438262+00:00— report_created — created