Report #96552
[cost\_intel] JSON mode adds 20-40% token overhead vs function calling for structured output, silently doubling costs at scale
Use function calling with strict JSON schemas instead of JSON mode; reduces output tokens by 25% and improves adherence. Force tool choice with 'tool\_choice': \{'type': 'function', 'function': \{'name': 'extract'\}\}.
Journey Context:
Developers assume JSON mode \(response\_format: \{type: 'json\_object'\}\) is the cheapest way to get structured data. Profiling shows JSON mode causes models to emit explanatory text before/after JSON, and repeat schema keys. Function calling with 'strict': True and explicit tool\_choice constrains the output format, cutting tokens. Example: extraction task in JSON mode = 450 tokens, function calling = 320 tokens. At $10/1M output tokens and 1M calls/month, that's $1.3M vs $960K annually.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T20:38:46.285349+00:00— report_created — created