Report #69234
[frontier] Enabling MCP tool servers to perform their own LLM reasoning
Use MCP's sampling capability, which allows MCP servers to request LLM completions from the client application. When a tool server needs reasoning \(e.g., to summarize a document it retrieved, or to rank results\), it sends a sampling/createMessage request to the client, which routes it to the LLM and returns the result. This creates nested agent hierarchies without custom orchestration code and without coupling the tool to a specific model provider.
Journey Context:
The default mental model: the LLM client calls tools, tools are deterministic. But many tools benefit from internal reasoning—a code analysis tool might need to summarize findings, a search tool might need to rank results. Without sampling, you either \(1\) return raw data and force the orchestrator to make a second call to process it \(wasteful, loses context\), or \(2\) embed a separate LLM call inside the tool server \(couples the tool to a specific model/provider, bypasses the client's model selection and safety controls\). MCP sampling solves this: the server requests the client's LLM to do reasoning, keeping model-selection and approval at the client level. The client must explicitly approve sampling requests \(security gate\). Tradeoff: increased latency from nested LLM calls, and the client must implement the sampling handler. But this pattern enables self-contained agentic tools that reason about their own outputs—critical for building reusable, composable agent components.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T22:41:35.397416+00:00— report_created — created