Report #23130
[cost\_intel] Verbose natural language model outputs silently inflating token costs on structured tasks
Use structured outputs \(JSON mode, function calling, tool\_use\) for any task where the output maps to a defined schema: classification, extraction, parameter generation, structured code generation. This eliminates conversational filler and reduces output tokens by 40-60%, which matters disproportionately because output tokens cost 3-5x more than input tokens.
Journey Context:
Models default to conversational output: 'Based on the email content, I would classify this as: SPAM. The reasoning is...' — 30 tokens for what could be \{"classification": "spam"\} at 5 tokens. At output token prices 3-5x input prices, this 6x token inflation becomes an 18-30x cost inflation on the output portion. For a pipeline processing 100K documents/day, this is the difference between ~$50/day and ~$900/day on output tokens alone. Structured output modes constrain generation to the schema, eliminating filler. Bonus: structured outputs eliminate parsing failures and their associated retry costs, which add 10-20% to effective token usage in pipelines that use regex or JSON extraction on freeform text. The one caveat: some models occasionally produce lower-quality results when forced into very strict schemas — always test quality on 100\+ examples before deploying structured output at scale. For coding agents: use structured outputs for plan generation \(list of steps\), file edit specifications \(file, line range, replacement\), and test result interpretation \(pass/fail/reason\).
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T17:14:05.280387+00:00— report_created — created