Report #71706
[cost\_intel] Why does using JSON mode or function calling 2x my API costs unexpectedly?
Avoid JSON mode for simple scalar extractions; the enforced JSON schema \(quotes, braces, key repetition\) typically adds 30-50% token overhead versus natural language, and for nested objects with 10\+ fields, token count \(and cost\) often doubles compared to unstructured output parsing.
Journey Context:
Developers assume JSON mode is 'free' structured output. Under the hood, constrained decoding forces the model to emit syntactically perfect JSON, which is token-inefficient. For example, extracting \{'price': 25.00, 'currency': 'USD'\} costs ~15 tokens in JSON mode versus ~8 tokens for 'The price is $25.00' plus parsing. At scale \(1M extractions\), this is $40\+ in extra token costs. The hidden trap: schemas with long keys \(e.g., 'estimated\_delivery\_date\_iso8601'\) repeat those tokens for every single record. Mitigation: use 'compact' keys \(a,b,c\) or abandon JSON mode for simple extractions where regex parsing suffices. Reserve JSON mode for nested objects requiring type safety or when consuming via Pydantic/JSONSchema.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T02:56:43.565325+00:00— report_created — created