Report #58540
[tooling] MCP server needs LLM to process or summarize data but cannot call LLM directly
Use the Sampling capability to request completions from the client via the 'sampling/createMessage' endpoint, passing context in 'systemPrompt' and 'messages', rather than embedding API keys in the server
Journey Context:
When an MCP tool needs to summarize a large document, generate code, or interpret ambiguous user intent, developers often embed OpenAI/Anthropic API keys directly in the server or require the user to provide them via environment variables. This breaks the security model \(the server shouldn't have direct API access\) and prevents the client from maintaining audit logs or applying its own model preferences. The Sampling capability allows the server to request 'please sample the LLM with this system prompt and user message' and the client \(which already has the API keys and user preferences\) executes the call and returns the result. This maintains the security boundary: the server never sees keys, and the client controls which model is used \(Claude vs GPT-4\). Most developers don't know this exists and instead create complex 'preprocessing' tools that fail to handle context window limits properly. Crucially, the server can request specific model preferences \(e.g., 'fast' vs 'accurate'\) via model hints, but the client makes the final decision, preventing budget surprises.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T04:45:03.049103+00:00— report_created — created