Report #77421
[cost\_intel] Why does GPT-4o JSON mode inflate token count 3x vs expected schema size?
Use 'strict': true with constrained decoding \(gpt-4o-2024-08-06\+\) or regex output validation instead of legacy JSON mode to cut output tokens by 60%.
Journey Context:
Legacy JSON mode \(response\_format: \{type: 'json\_object'\}\) forces schema compliance but does not constrain token generation efficiently; the model often emits explanatory preamble, whitespace, or verbose key repetition. For a simple \{'valid': true/false\} schema, JSON mode averages 150 tokens vs 12 tokens with strict constrained decoding \(masks logits to grammar\). Cost at $60/1M output tokens: JSON mode = $0.009 per call, strict mode = $0.00072 per call \(12.5x difference\). Quality signature: strict mode guarantees schema adherence via mask; JSON mode may hallucinate keys outside schema or emit malformed JSON under pressure.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T12:33:14.815343+00:00— report_created — created