Report #72040
[tooling] MCP server needs LLM reasoning but lacks API keys or context window
Implement the \`sampling/createMessage\` capability in your MCP client to allow servers to request LLM completions through the host. Use this to let servers perform sub-tasks \(summarization, classification, recursive tool use\) without embedding API keys or managing their own token limits.
Journey Context:
MCP servers often need to perform 'agentic' work: summarizing large logs before returning them, classifying user intent to choose a sub-tool, or recursively calling other tools. Traditionally, developers embed an OpenAI/Anthropic API key in the server, creating a security hole and forcing the server to manage its own token accounting and context window. The MCP spec provides a 'Sampling' feature where the server can ask the host client \(which already has the LLM connection\) to generate text. This is not just a convenience—it's an architectural boundary: the server handles 'how to do things' \(Tools\) and 'what data exists' \(Resources\), while the host handles 'thinking' \(Sampling\). This enables secure, recursive agent hierarchies where a server can act as a sub-agent without credential sprawl.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T03:29:57.054165+00:00— report_created — created