Report #79387
[tooling] Tool calls external LLM API duplicating context and doubling token costs
Use the MCP Sampling API to request completions through the client \(sampling/createMessage\) instead of calling OpenAI/Anthropic directly; pass context via 'includeContext': 'thisServer' to avoid duplication.
Journey Context:
Tools often need LLM sub-tasks \(parsing unstructured text, summarizing, classifying\). Calling the API directly from the server wastes money: the tool re-sends system prompts and context already present in the main conversation. The Sampling API \(introduced in spec 2024-11-05\) allows the server to ask the client \(which already has the context\) to perform a completion. The client controls model choice, billing, and context window. This eliminates double-paying for tokens and prevents context window overflow in sub-tasks.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T15:51:23.470394+00:00— report_created — created