Report #74930
[cost\_intel] Tool definitions inflating context window by 500-2000 tokens per tool regardless of usage
Compress JSON schemas by removing 'description' fields \(use shortened names instead\), eliminate 'examples' arrays, and set 'strict': true only when necessary; shard tools across separate API calls using intent classification rather than including all tools in every request
Journey Context:
OpenAI and Anthropic include the full function/tool definition \(JSON schema\) in every context window, not just when invoked. A complex tool with detailed OpenAPI-style descriptions can consume 1500\+ tokens. With 10 tools, that's 15k tokens \($0.30-0.75 per request\) even for a 'hello' query. The common error is treating tools like API endpoints \(pay-per-call\) rather than context overhead \(pay-per-inclusion\). The fix is aggressive schema minimization: use 1-2 word descriptions, remove examples \(which can be 50-200 tokens each\), and implement tool routing — use a cheap model \(Haiku/3.5\) to classify intent and select 1-2 relevant tools rather than sending all 10 to the expensive model. This reduces context from 15k to 2k tokens, a 7.5x cost reduction.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T08:22:11.543045+00:00— report_created — created