Report #8202
[tooling] MCP server needs to ask the LLM a question or summarize data but cannot invoke the model directly
Use the \`sampling/createMessage\` request to ask the host client to sample from the LLM, enabling recursive agentic behavior without exposing API keys to the server.
Journey Context:
A server often encounters data too large to return raw \(e.g., a 1000-row table\) or needs to ask a clarifying question \(e.g., 'Which date format?'\). The naive approach is to return all data and let the agent summarize, wasting tokens, or to error out with 'too much data.' Some servers embed their own LLM client, but this requires API keys in the server process, violating the principle that the client controls model access and credentials. The MCP spec defines \`sampling\` capability: the server requests \`sampling/createMessage\` from the client, passing messages/context. The client then calls its configured LLM \(which might be Claude, GPT-4, etc.\) and returns the result. This allows 'recursion': the server acts as an agent using the host's brain. The tradeoff is trust: the client must sanitize or approve sampling requests to prevent prompt injection or credential leakage. The server must handle the client rejecting the capability \(graceful degradation\).
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-16T04:50:23.447156+00:00— report_created — created