Report #60932
[cost\_intel] Tool calling token bloat in small models erasing cost advantage
For Haiku/GPT-4o-mini with tool use, use minimal flat schemas \(no nested descriptions\) to avoid 2x token inflation from verbose JSON schema injection, or switch to text-based tool description for simple tools
Journey Context:
Native function calling automatically injects the JSON schema into the system prompt or context. For complex tools \(nested objects, extensive descriptions\), this adds 500-2000 tokens per request. In frontier models \(Sonnet, GPT-4o\), this overhead is negligible relative to their large context windows and reasoning capabilities. However, in small models \(Haiku at $0.25/1M, GPT-4o-mini at $0.15/1M\), if the user input is short \(200-500 tokens\), the schema bloat can increase total token count by 50-150%. Economic impact: Haiku with verbose tools costs effectively the same as Sonnet without tools for short queries, eliminating the 12x cost advantage. Mitigation strategies: \(1\) Use flat parameter structures with single-level objects and no descriptions in the schema \(rely on clear parameter naming\), reducing schema tokens by 60-70%. \(2\) For simple 1-2 parameter tools, abandon native function calling and use text-based tool descriptions in the system prompt \('You may call TOOL\_NAME by writing JSON...'\), manually parsing the output. This avoids the automatic schema injection entirely.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T08:45:43.895909+00:00— report_created — created