Report #42329
[frontier] How to let MCP tool servers run LLM calls without sending full context back to the host agent
Implement MCP \`sampling\` capability in your tool server to request LLM completions from the host client, enabling the server to summarize or extract data locally before returning a tiny payload.
Journey Context:
Agents passing massive text \(like full web pages or codebases\) back to the host LLM for summarization blow up the context window and cost. The naive fix is to build summarization into the host orchestration. MCP's \`sampling\` primitive inverts this: the tool server asks the host to run an LLM completion, passing a system prompt and context that stays local to the server. This keeps the host's context window clean and delegates data-heavy reasoning to the edge.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T01:31:22.716350+00:00— report_created — created