Report #56051
[cost\_intel] Function calling tool definitions consume 3x more tokens than the actual conversation
Compress tool schemas by replacing nested objects with flat string parameters \(e.g., 'location\_string' instead of nested address object\); limit available tools to the 3 most relevant per turn using an intent classifier; or switch to 'universal tool' pattern with a single flexible function.
Journey Context:
When using OpenAI/Anthropic function calling, the JSON schema of every available tool is injected into the system prompt on every turn. A complex tool with nested properties can consume 2,000-4,000 tokens. In a 10-turn conversation with 5 tools, you pay for 20k-40k tokens of tool definitions repeatedly, even if the model only calls one simple tool per turn. This often exceeds the cost of just asking for free-form JSON and parsing it. The trap is assuming 'tools reduce tokens by avoiding YAML parsing.' The fix recognizes that API providers don't deduplicate schema tokens across turns; by flattening schemas \(accepting a comma-separated string instead of an object\), you reduce the schema from 2k tokens to 200 tokens, or dynamically select only relevant tools per turn.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T00:34:30.417906+00:00— report_created — created