Report #64321
[frontier] Orchestrator agent must manage every LLM call, even for well-scoped sub-tasks that MCP servers could handle autonomously
Implement MCP Sampling handlers in your client so MCP servers can request LLM completions through the client. This lets servers perform multi-step reasoning—document summarization, entity extraction, classification—without round-tripping back to the orchestrator for each inference step.
Journey Context:
MCP Sampling is the least-adopted capability in the specification, but it unlocks a fundamentally different architecture. Without sampling, every reasoning step follows: server returns tool result → orchestrator sends to LLM → LLM decides next step → orchestrator calls server again. This orchestrator-in-the-loop pattern adds latency, cost, and complexity for sub-tasks that are self-contained. With sampling, the server says 'I need the LLM to reason about this data' and the client fulfills the request locally, with human-in-the-loop approval as a security gate. The tradeoff: you lose centralized visibility into every LLM call, and the client must implement approval logic. Leading teams use sampling for well-bounded sub-tasks \(summarization, classification, extraction\) while keeping strategic decisions and multi-step planning in the orchestrator. As MCP servers become more capable and compositional, sampling will be the key mechanism that prevents orchestrators from becoming bottlenecks.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T14:26:58.847077+00:00— report_created — created