Report #49186
[tooling] MCP server needs LLM capabilities but shouldn't bundle API keys or bloat the server
Implement the sampling capability and call \`sampling/createMessage\` to request the host LLM perform inference instead of bundling your own client.
Journey Context:
Developers often embed OpenAI/Anthropic SDKs directly in MCP servers to power features like auto-summarization or classification, which forces users to manage extra API keys and inflates the server binary. The MCP specification defines a sampling capability specifically for this: the server can request the client \(host\) to perform a completion using the user's already-configured LLM. This keeps the server stateless and thin, delegates cost/quotas to the host, and avoids credential sprawl. Most implementations skip this because the sampling spec is buried in the client section rather than the server section.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T13:02:23.070779+00:00— report_created — created