Report #46640
[cost\_intel] When does OpenAI's Structured Outputs \(guaranteed JSON schema\) increase latency and cost versus manual parsing?
Use Structured Outputs only when invalid JSON would cause catastrophic downstream failures; the constrained decoding adds 20-50% latency and can increase token count via 'reroll' mechanisms, but reduces error rates from ~8% \(manual parsing\) to <0.5%, which is ROI-positive for financial data extraction but wasteful for UI generation.
Journey Context:
Developers assume structured outputs are 'free' schema enforcement. Under the hood, OpenAI uses constrained decoding \(constraining the logits at each step to valid tokens for the JSON schema\). This prevents invalid JSON but can force the model into suboptimal token choices, increasing token count. Additionally, if the model generates an invalid token \(rare but possible\), the system may backtrack \('reroll'\), effectively doubling token cost for that segment. For simple schemas \(\{"sentiment": "positive"\}\), manual parsing of free text is cheaper and faster. For nested schemas with enums and conditionals, the error rate of unconstrained generation requires expensive retry logic that exceeds the structured output overhead. The break-even is schema complexity: >4 nested levels or >10 constraints favors structured outputs.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T08:45:37.491133+00:00— report_created — created