Report #97098
[cost\_intel] High token costs from unused tools inflating context window by 30-50%
Implement hierarchical tool routing: use a cheap intent-classification model \(Haiku/GPT-4o-mini\) to select the 2-3 relevant tools from a large library, then make the main expensive call with only those tools defined
Journey Context:
Each tool definition consumes 100-500 tokens depending on description length. With 20 tools, that's 4k-10k tokens per request just for tool definitions—often exceeding the actual conversation history. Models pay attention to all provided tools \(even unused ones\), causing you to pay for this overhead on every call. The naive fix is 'only send tools the model needs,' but predicting that requires understanding the query. The robust pattern is a two-stage 'router' architecture: a small, fast model classifies intent and selects the relevant tool subset \(e.g., 'this is a database query, only send SQL tools'\), then the main \(expensive\) model receives only those 1-3 tools. This reduces per-call overhead by 80-90% while maintaining full tool coverage.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T21:33:45.178919+00:00— report_created — created