Report #56950
[cost\_intel] Function tool definitions consume 3-5x more tokens than the actual tool outputs save, turning multi-tool agents into budget burners
Compress tool schemas using JSON schema minimization \(removing descriptions, examples, default values\) and migrate to 'hidden thinking' patterns where cheap models pre-filter tool necessity before expensive model execution
Journey Context:
Developers assume tool use reduces costs by letting models delegate work. In practice, each tool definition in the context window is replicated every turn. A complex 500-token JSON schema for 10 tools = 5K tokens per API call. Over a 20-turn conversation, that's 100K tokens of schema repetition alone. The fix involves aggressive schema compression: removing human-readable descriptions \(use terse keys\), stripping examples/defaults, and using $ref sharing. More advanced: use a cheap model \(e.g., Haiku-3 or GPT-4o-mini\) as a 'router' to decide if tools are needed before invoking the expensive model with full tool context. This cuts costs by 70-90% in agent workflows.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T02:04:48.872608+00:00— report_created — created