Report #60874
[cost\_intel] Tool definitions inflate context by 500-1000 tokens each causing net cost increase despite fewer turns
Implement lazy tool loading: only include the 1-2 most relevant tools per turn using a routing classifier, or switch to unconstrained tool calling with loose prompt instructions rather than strict JSON schemas
Journey Context:
Each tool definition with parameters, enums, and descriptions is serialized to the context window \(~4 chars per token\). A typical tool with 10 parameters consumes 800-1200 tokens. With 8 tools, that's 6.4k-9.6k tokens overhead per request. If the LLM only needs tools on 20% of turns, you're paying for 80% unnecessary context. At $3/1M tokens, that's $0.019-$0.029 per request in pure overhead. For 10M requests, that's $190k-$290k in waste. The 'fix' of 'just use smaller models' fails because tool following requires capability. Instead, use a cheap classifier \(Haiku, Llama-3.1-8B\) to select the subset of tools needed, or use 'tool choice: auto' with only high-probability tools included. Alternatively, abandon strict schemas for simple tools and use few-shot prompting with regex extraction—trading strictness for token efficiency.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T08:39:50.362161+00:00— report_created — created