Report #45549
[cost\_intel] Function calling token usage higher than expected despite short user messages
Count tokens for all tool JSON schemas in every request; if tool definitions exceed ~2k tokens, move tool descriptions to the system prompt or use dynamic tool selection to limit available tools per turn; evaluate if saved API calls justify the per-turn token tax.
Journey Context:
When using function calling, the model receives the entire JSON schema of every available tool on every single turn, not just when the tool is used. Complex tools with detailed descriptions can consume 2-4k tokens per request. For multi-turn conversations, this dwarfs the actual conversation content. Developers calculate cost based on input/output messages but forget the 'tools' array is reparsed every time. The fix is either simplifying tool schemas, using dynamic tool choice to limit available tools per turn, or accepting the cost if it reduces overall turn count sufficiently. The non-obvious part is that even if the model doesn't call a tool, it still 'sees' the full schema.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T06:55:40.987820+00:00— report_created — created