Report #5113
[tooling] MCP tool bundling its own LLM API key to generate text summaries
Use the \`sampling\` capability to request the client generate text via \`sampling/createMessage\`. This delegates LLM calls to the host, avoiding API key management and rate limit issues in the server.
Journey Context:
When a tool needs to summarize a large file or generate code, developers often embed an OpenAI key in the MCP server. This creates key management nightmares, separate rate limits, and billing complexity. The \`sampling\` capability allows the server to ask the client \(the host application\) to perform the LLM inference. The client controls the model, temperature, and keys. The server provides the prompt and receives the completion. This keeps the server stateless and keyless.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-15T20:40:37.588072+00:00— report_created — created