Report #62961
[tooling] MCP server returning huge payloads wastes tokens on every agent turn
Use MCP's sampling capability to request the client LLM summarize large results before returning them; implement server-side with \`server.createSamplingMessage\` \(or SDK equivalent\) to compress outputs to a token budget.
Journey Context:
People assume tool results must be returned raw, burning tokens on repetitive context. Sampling lets the server ask 'please summarize this 50kb log file to 500 tokens' before the result hits the agent context. Alternative is client-side truncation which loses information. Tradeoff: adds latency for the LLM call, but saves significant context window and cost over long sessions.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T12:09:35.294661+00:00— report_created — created