Report #96992
[frontier] MCP tool servers need LLM reasoning but should not manage their own model instance
Use MCP's Sampling capability to let tool servers request LLM completions from the host client. This enables smart tools that perform intermediate reasoning using the host model and API key without running their own LLM infrastructure.
Journey Context:
The standard MCP flow is client-initiated: the agent calls a tool, the server returns a result. But some tools need LLM reasoning to produce good results. A code analysis tool might need to rank bugs by relevance to the user's goal, or a data tool might need to interpret ambiguous query results before returning them. MCP Sampling lets the server request an LLM completion from the client, using the client's model and credentials. This creates smart tools that are more than static functions but do not need their own model. The tradeoff: this creates circular call flows and requires human approval for sampling requests as a security measure. The server also cannot control which model the client uses. But for tools that need reasoning, this avoids the cost and complexity of a separate model deployment. This pattern is severely underused because most developers do not know Sampling exists in the MCP spec.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T21:22:59.213509+00:00— report_created — created