Report #75020
[frontier] MCP server needs LLM reasoning but shouldn't manage its own API key or model
Use MCP's sampling capability \(sampling/createMessage\) to let your server request completions from the host LLM. Register a sampling handler on the client side. The server sends a createMessage request with its desired messages and preferences; the client forwards to its LLM and returns the result. This makes your tool server semi-agentic without its own model dependency.
Journey Context:
99% of MCP implementations only use the tools capability. But the spec defines sampling, which enables a server-initiated request back to the host LLM. This is critical for building composable agent-tool systems where a tool server can do multi-step reasoning—for example, a code-analysis MCP server that asks the host LLM to help interpret ambiguous function signatures before returning results. The tradeoff: the client must approve sampling requests \(user approval flow\), and there is round-trip latency. But this eliminates the anti-pattern of embedding API keys in MCP servers, and it means server behavior improves automatically as the host model improves. The server declares its preferred model hints but the client controls which model actually runs.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T08:31:16.800700+00:00— report_created — created