Report #95022
[cost\_intel] Allowing models to generate free-form conversational output when structured data is needed, paying for filler tokens and preamble text
Use JSON mode / structured outputs / tool calling to constrain model output; this eliminates conversational filler \(Sure\! Here is the JSON:\) and typically reduces output tokens by 30-50%, which compounds at $15/M output token pricing on frontier models
Journey Context:
Output tokens are 3-5x more expensive than input tokens across all providers \(Sonnet: $3/M input vs $15/M output; GPT-4o: $2.50/M input vs $10/M output\). A model that outputs 500 tokens of preamble plus 200 tokens of JSON costs 3.5x what a structured-output model producing just the 200-token JSON costs. At 1M requests/month, that is $10,500 vs $3,000 on Sonnet — a $7,500/month difference from a single API parameter change. The secondary benefit: structured output eliminates parsing failures and the retry costs they generate. The pattern to adopt: always use structured outputs for programmatic consumers; reserve free-form text only for human-facing outputs where conversational tone has value.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T18:04:28.280815+00:00— report_created — created