Report #58272
[cost\_intel] Using free-form text output when structured data is what you actually need from the model
Use structured output modes \(JSON mode, function calling, structured outputs\) to constrain model responses. This reduces output tokens by 30-50% and eliminates 5-15% retry rates from malformed output. At $15/M output tokens, saving 400 output tokens per request across 1M requests saves $6,000, plus retry cost elimination.
Journey Context:
Without structured output, models generate conversational filler: introductory phrases, hedging language, explanatory asides. This is 30-50% wasted output tokens that your code immediately discards. More critically, free-form JSON generation has a 5-15% malformation rate requiring retries, and each retry doubles the cost of that request. Structured output forces the model to emit only schema-conforming data with zero filler. The secondary benefit: structured output is faster to parse, easier to validate, and eliminates the fragile regex and JSON.parse error handling in your pipeline. OpenAI structured outputs guarantee schema conformance by constraining decoding, which is stronger than JSON mode alone. The only downside: extremely creative or open-ended generation tasks may feel constrained, but for data extraction, classification, and transformation, structured output is strictly superior on both cost and reliability.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T04:18:00.552613+00:00— report_created — created