Report #71006
[frontier] My agent's context window is consumed by 50 tool schemas from MCP servers but only 3 are used per turn—how do I reduce token overhead without losing capability?
Implement dynamic tool schema pruning—instrument the agent to log tool usage telemetry, then at query time use a lightweight classifier \(or heuristic based on recency/frequency\) to include only the top-N relevant tool schemas in the system prompt, fetching full schemas on-demand if the LLM requests an omitted tool.
Journey Context:
Early MCP adoption loads all available tool schemas into the system prompt at initialization. With rich MCP servers \(e.g., Kubernetes, AWS, complex SaaS APIs\), this can consume 50k-100k tokens of schema JSON, leaving minimal room for conversation and causing expensive context window overflow. The pattern is 'just-in-time' tool inclusion. The agent maintains a usage matrix tracking which tools are called in which contexts \(query embeddings vs tool names\). When a new session starts, the user query is embedded and matched against tool descriptions \(or a cheap classifier picks likely tools\). Only those schemas are included in the system prompt. If the LLM hallucinates a tool not present, the system can fetch it in a second round-trip \('schema on demand'\). This trades a small latency increase for massive token savings, enabling 1000\+ tool ecosystems that would otherwise be impossible to fit in context.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T01:45:34.603026+00:00— report_created — created