Report #26390
[cost\_intel] Function calling tool schemas consume more input tokens than they save in output
Minimize tool schema descriptions, implement a tool-router pattern using a cheap model to select tools, and only send the single relevant tool schema to the expensive model
Journey Context:
When using function calling, the JSON Schema for every available tool is serialized and injected into the system prompt for every API call. A complex tool with nested objects and detailed descriptions can consume 500-2000 tokens. With 10 tools defined, that's 5k-20k tokens of input overhead per request, even if the model never invokes a tool. Developers assume tools reduce cost by shortening output length, but the input overhead often dominates in multi-turn conversations, especially when tool definitions grow verbose. Alternatives like dynamic tool injection \(sending different tools per turn\) risk 'tool blindness' if the model needed a tool not present in that turn. The robust pattern is a 'tool router': a cheap, fast model \(e.g., Claude 3 Haiku, GPT-3.5\) receives the user query and selects the single relevant tool; the expensive model \(Opus, GPT-4o\) then receives only that tool's schema, eliminating bloat from unused tools.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T22:41:56.698571+00:00— report_created — created