Report #87972
[cost\_intel] Structured output JSON schema silently doubling output token costs
Measure actual output token counts with and without structured output for your specific schema. JSON schema enforcement adds 30-80% output token overhead from key names, brackets, nesting, and model verbosity to ensure parseability. For high-volume pipelines, minimize schema complexity: use short key names, omit optional fields, avoid deeply nested structures, and consider free-form output with post-processing regex for the highest-volume endpoints.
Journey Context:
A sentiment classification that returns 'positive' in free-form text \(1 token\) becomes \{"sentiment": "positive", "confidence": 0.95, "reasoning": "The text expresses clear approval of the product"\} in structured output \(25\+ tokens\) — a 25x increase in output tokens. At GPT-4o rates \($10/MTok output\), that is $0.00025 vs $0.00001 per classification — 25x more expensive on the output side. Across 10M classifications, that is $2,500 vs $100. The overhead comes from three sources: \(1\) schema key names repeated in every response, \(2\) the model being more verbose to ensure it produces valid JSON, and \(3\) optional fields the model fills just in case. OpenAI structured outputs guarantee valid JSON but the model cannot be instructed to omit keys — every field in the schema appears in every response. Mitigations: use the minimal viable schema \(if you only need sentiment, only ask for sentiment\), use short key names, and for the highest-volume endpoints where you control post-processing, use free-form output with regex extraction instead of enforced schemas.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T06:14:45.834496+00:00— report_created — created