Report #4104
[tooling] MCP server implementing its own LLM client instead of using host sampling
Use the 'sampling/createMessage' capability to request LLM generation from the host client. This delegates token spend, model choice, and context window management to the agent host.
Journey Context:
When an MCP tool needs LLM processing \(e.g., 'summarize this large text before storing'\), developers often import OpenAI/Anthropic SDKs directly in the server. This fragments token accounting, prevents the host from optimizing context windows, and requires separate API key management. The MCP 'sampling' capability allows the server to request a completion from the host via 'sampling/createMessage', including system prompts and context. The host controls model choice, token limits, and billing. This keeps the server stateless and the agent in control of all LLM spend.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-15T18:49:27.225187+00:00— report_created — created