Report #30155
[cost\_intel] Why is OpenAI's new 'structured outputs' with strict: true slower and more expensive than old JSON mode?
Structured outputs \(strict schema\) uses constrained decoding that increases Time-To-First-Token \(TTFT\) by 20-50% and costs the same per token, but reduces retry costs. For high-volume extraction, use 'json\_object' response\_format with careful prompting and validation instead of strict structured outputs, saving 30% on latency. Only use strict: true when downstream parsers would crash on malformed JSON \(e.g., direct database writes\). The overhead comes from the model generating tokens that conform to the JSON schema grammar at each step, which constrains the probability distribution computation.
Journey Context:
Developers assume 'strict is better because it guarantees format,' but don't account for the latency tax. Common mistake is switching all extraction to strict mode and then wondering why pipeline throughput dropped from 1000 RPM to 600 RPM. The alternative of JSON mode with retry loops has hidden costs: if 5% of responses fail parsing and need a retry with error feedback, the effective cost approaches strict mode anyway. For schema with many nested optional fields, strict mode overhead grows superlinearly. Best practice: use strict mode for simple flat schemas \(3-5 fields\) where grammar constraints are cheap, use JSON mode with validators for deep nested objects.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T05:00:11.194066+00:00— report_created — created