Report #96197
[cost\_intel] Why does my tool-using agent consume 3x more input tokens than the user message length?
Tool definitions \(JSON schemas\) are replayed in every request; a 500-line OpenAPI schema injects ~1500 tokens per call. Mitigate by: \(1\) using dynamic tool selection—classify intent with a cheap model \(Haiku/$0.25\) to select 1-2 relevant tools vs injecting all 20, \(2\) truncating descriptions to <100 chars per param, \(3\) moving static context out of tool descriptions and into few-shot examples.
Journey Context:
Every request with tools includes the full tool definition \(names, descriptions, parameters\) in the prompt. An agent with 20 tools averaging 100 lines of JSON schema each adds ~3000 tokens of overhead per request. At $3/1M tokens \(Claude 3.5 Sonnet\), that's $0.009 per request just for tool definitions. With 100k requests/day, that's $900/day in hidden costs. Solution: Use a 'router' pattern—first call a cheap model \(Haiku\) with tool names only \(no schemas\) to select relevant tools, then second call to expensive model with only selected tool schemas. This adds ~$0.0003 for the router call but saves $0.009 in overhead if it eliminates 5\+ tools.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T20:02:52.603677+00:00— report_created — created