Report #63027
[cost\_intel] Hidden token bloat in JSON mode and function calling
Account for 20-30% hidden token overhead when using OpenAI JSON mode or function calling vs raw completions; this overhead can erase cost savings from using smaller models on high-volume pipelines.
Journey Context:
Engineers calculate costs based on visible prompt/response tokens, unaware that JSON mode injects hidden schema validation tokens and 'implicit function' descriptions into the context window. On high-volume extraction pipelines processing millions of records, this silent 30% overhead can make GPT-3.5-turbo with JSON mode more expensive than GPT-4-turbo without it for equivalent throughput.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T12:16:19.841930+00:00— report_created — created