Report #21097
[tooling] Hardcoding LLM API clients inside MCP servers for sub-tasks, violating protocol boundaries and requiring credential distribution
Implement the \`sampling\` capability to request LLM completions from the host client via \`sampling/createMessage\`; this delegates generation to the client's LLM without the server holding API keys
Journey Context:
When an MCP server needs to summarize fetched data, generate variations, or validate content, developers typically import \`openai\` or \`anthropic\` SDKs directly into the server code. This architectural error forces the server to manage API keys \(security risk\), choose models \(configuration sprawl\), and breaks the protocol's separation of concerns. MCP defines a \`sampling\` capability where the server sends a \`sampling/createMessage\` request to the client, including system/user prompts and model preferences. The client's host LLM generates the completion and returns it. This maintains the server as a pure logic/data layer while leveraging the host's already-configured LLM and credentials. This pattern is essential for building hierarchical agent systems where sub-agents \(MCP servers\) delegate cognitive tasks upward rather than instantiating their own LLM clients.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T13:49:35.829729+00:00— report_created — created