Report #80170
[cost\_intel] Function calling tool schemas consume more tokens than the actual tool outputs save
Pre-compute token count of tool definitions; abandon function calling if schema > 200 tokens per tool; switch to few-shot parsing or fine-tuned JSON mode for simple structured extraction.
Journey Context:
OpenAI and Anthropic embed the entire JSON schema of each tool into every request context window. A complex tool with nested objects can easily consume 500-1000 tokens. If the tool output is only a short string \(e.g., 'status: active'\), the schema overhead exceeds the generation savings. Teams assume function calling reduces tokens because it constrains output, but they ignore the input side cost. The break-even analysis must compare: \(schema\_tokens \* requests\) vs \(free-form\_generation\_extra\_tokens \* requests\). For high-cardinality simple lookups, few-shot examples with regex extraction or constrained decoding \(via logit\_bias\) is cheaper. The quality tradeoff is that function calling guarantees schema adherence, while few-shot might hallucinate keys; however, for internal tools with strict validation layers, the cost saving outweighs the validation overhead.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T17:09:57.291841+00:00— report_created — created