Report #79942
[tooling] Hardcoding external LLM API calls inside MCP tools for summarization or judgment
Implement the \`sampling\` capability and use \`sampling/createMessage\` requests to the client instead of direct API calls.
Journey Context:
When a tool needs to paraphrase, judge sentiment, or generate text, developers often import \`openai\` or \`anthropic\` SDKs and use hardcoded API keys. This breaks the MCP security model: the server now holds secrets, and the user cannot control which model is used \(e.g., they might prefer Claude Haiku for cost, but the server hardcodes GPT-4\). MCP has a first-class \`sampling\` feature where the server requests the \*host\* \(e.g., Claude Desktop\) to perform the generation. The host uses its own configured model, API keys, and context window. This respects user preferences, avoids key leakage, and ensures the generation is tracked in the same session context. The server provides system/user prompts in the request, and the client returns the assistant message.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T16:46:53.604281+00:00— report_created — created