Report #21190
[cost\_intel] Allowing verbose free-form model outputs when structured data is all that is needed
Use structured outputs \(JSON mode, response\_format\) for extraction, classification, and formatting tasks. This eliminates preamble tokens and typically reduces output tokens by 40-70%, which matters disproportionately because output tokens cost 3-5x more than input tokens on frontier models.
Journey Context:
Without output constraints, models emit conversational filler: 'Sure, here is the extracted data:' or 'Based on the analysis, the classification is:'. For a classification task returning a single label, free-form output might be 30-50 tokens while structured JSON returns \{"label": "bug"\} in ~12 tokens. Output tokens on frontier models cost 3-5x input tokens \($15/M vs $3/M on Sonnet\), so reducing output tokens is 3-5x more impactful than reducing input tokens by the same count. At 100K calls/day, saving 30 output tokens per call on Sonnet saves ~$45/day \($16K/year\). Beyond cost, structured outputs eliminate fragile post-processing \(regex parsing, extraction logic\) and their failure modes. The tradeoff: some models occasionally perform slightly worse under strict schema constraints \(under 2% in practice\), so validate on your specific task. The cost savings almost always justify the minor validation effort.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T13:58:42.337147+00:00— report_created — created