Report #96349
[cost\_intel] Hidden token costs of OpenAI function calling vs raw JSON
Avoid native function calling for simple extraction tasks; use JSON mode with constrained grammars instead. Function calling adds ~20-30% token overhead for the 'tools' schema injection and auto-generated descriptions, and forces parallel tool calls that waste tokens on multi-turn loops.
Journey Context:
Developers use OpenAI's function calling API for structured extraction, assuming it's optimized. Actually, the API injects the JSON schema of all available functions into every system message, adding hundreds of tokens per call. Additionally, the model often generates a 'tool\_call' object with duplicated reasoning content before the actual arguments. For simple extraction \(e.g., get me the price\), forcing JSON mode with a constrained regex/grammar \(via logit\_bias or response\_format\) uses ~30% fewer tokens and avoids the 'function calling loop' where the model calls a tool, gets result, then calls another. The exception is genuine multi-step tool use where the model must decide between tools.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T20:18:27.760932+00:00— report_created — created