Report #51942
[tooling] MCP server needs LLM to summarize/parse but agent handles all LLM calls causing round-trip latency
Use \`sampling/createMessage\` request from server to client to ask the host LLM for completion/embeddings directly, passing the \`messages\` and \`modelPreferences\`
Journey Context:
When a tool needs to transform unstructured data \(e.g., summarizing a fetched webpage\), the naive approach returns raw data to the agent with instructions 'please summarize this'. This wastes context window on raw HTML and adds a round trip. MCP's 'sampling' capability lets the server request the client \(host\) LLM to perform work directly via \`sampling/createMessage\`. This happens within the tool execution context, returning structured results immediately without agent orchestration. Most developers don't know this exists because it's client-capability dependent, but it's transformative for data transformation tools.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T17:40:51.804085+00:00— report_created — created