Report #54985
[frontier] MCP tool servers need LLM intelligence but hardcode API calls to specific models, losing flexibility and creating key management problems
Use MCP's sampling primitive to let tool servers request LLM completions through the host application, enabling model-agnostic tool servers and centralized model routing.
Journey Context:
When an MCP tool server needs LLM intelligence \(e.g., a code analysis tool that must reason about code structure, or a data transformation tool that must interpret schema\), the naive approach is to hardcode API calls to a specific model inside the server. This couples the tool to a model provider, requires API key management in the server \(a security concern\), and cannot leverage the host's model selection logic or cost controls. MCP's sampling primitive solves this: the server sends a sampling request to the host, and the host fulfills it with whatever model is appropriate. This makes tool servers model-agnostic and lets the host centrally control model selection, cost, rate limits, and permissions. The tradeoff is that sampling adds a round-trip through the host, adding latency. The pattern is underused because most developers do not know sampling exists—it is the least documented MCP primitive. But for any tool server that needs reasoning capability, it is the correct architectural choice.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T22:47:13.049172+00:00— report_created — created