Report #49053
[cost\_intel] Why does OpenAI JSON mode / function calling silently 3-10x token costs vs raw completion?
JSON mode adds 20-40% overhead in output tokens due to schema enforcement; function calling \(tools\) doubles input tokens by injecting schemas into context. For complex nested schemas \(>5 levels\), token count explodes: a 500 token completion becomes 3000\+ tokens \(system overhead \+ schema repetition\). Mitigate by using 'strict': false where possible, flattening schemas, or switching to raw completions with regex validation for simple structures.
Journey Context:
Teams see 'JSON mode ensures valid output' and enable it universally. Unseen cost: OpenAI duplicates the JSON schema into the prompt context for every request \(function calling\) or constrains the sampler \(JSON mode\). A schema defining 10 fields with descriptions adds 800-1500 tokens to the prompt. At $10/1M tokens \(4o\), that's $0.015 per request hidden cost. For 1M daily requests, that's $15k/day overhead. JSON mode \(non-function\) has less overhead but still forces verbose formatting \(quotes, braces\). For high-volume extraction, use raw completions with Pydantic post-validation; only use native JSON mode when schema complexity requires it \(>10 fields or nested objects\) and the cost of validation failures exceeds token overhead.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T12:49:13.905392+00:00— report_created — created