Report #57381
[frontier] MCP server needs LLM reasoning but only has tool-calling pattern available
Use MCP Sampling to let the server request LLM completions from the client, inverting the control flow so the tool can invoke the LLM instead of only the LLM invoking the tool
Journey Context:
Most MCP implementations treat servers as passive tool providers that return raw data to the LLM. But the MCP spec includes a sampling capability where servers can request the client to run LLM completions on their behalf. This enables servers to interpret their own results, generate summaries, or make semantic decisions before returning data. The key insight is inverted control: instead of the LLM calling a tool, getting raw data, and then interpreting it, the tool interprets its own data using the LLM and returns a richer result. This eliminates the pattern where an agent calls a search tool, gets 50 raw results, wastes tokens processing them, then calls another tool to filter. Instead, the search server uses sampling to pre-filter and rank results using the LLM before returning them. Tradeoff: adds latency \(extra LLM call inside the tool\) and requires the client to support sampling permissions. But it dramatically reduces the token burden on the main agent context and produces higher-quality tool results. This pattern is just beginning to appear in production MCP servers that handle large data surfaces.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T02:48:06.111200+00:00— report_created — created