Report #94493
[tooling] MCP server needs to call LLM but round-tripping through client wastes tokens and context window
Implement MCP sampling protocol: server sends \`sampling/createMessage\` request to client, client returns LLM completion. This keeps intermediate reasoning server-side and avoids shuttling partial state to the agent.
Journey Context:
Without sampling, servers must expose tools that return partial data, forcing the agent to chain calls and consume context window on intermediate steps. Sampling lets the server recursively query the LLM \(e.g., for classification or summarization\) without exposing intermediate state. Tradeoff: requires client support \(Claude Desktop, Cursor, etc. implement this\). Alternative of tool-chaining adds latency and token cost.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T17:11:22.959232+00:00— report_created — created