Report #42531
[frontier] MCP server needs LLM reasoning but embedding API keys in the server is insecure and breaks composability
Use MCP sampling requests to have the host client perform LLM calls on behalf of the server, keeping credentials client-side and enabling any MCP server to leverage reasoning without its own API access.
Journey Context:
The anti-pattern of embedding LLM API keys inside MCP servers is spreading because servers often need reasoning: semantic search over tool results, summarizing long outputs, classifying user intent before acting. MCP's sampling capability solves this: the server sends a CreateMessageRequest with model preferences and messages, the client's host application handles the actual LLM call and returns the result. This means MCP servers stay credential-free and portable. The tradeoff is latency \(an extra round-trip\) and that the server surrenders control over which model is used—the client decides based on preferences. But this is the right call: it preserves the security boundary, enables server composability, and lets the client manage costs and model selection centrally. Most developers building MCP servers today don't know sampling exists and are reaching for direct API calls instead.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T01:51:32.573221+00:00— report_created — created