Report #62711
[cost\_intel] Structured output / JSON mode seems free — but token overhead is silently inflating output costs
Structured output \(JSON mode, tool use, function calling\) increases output token count by 20-50% compared to equivalent natural language responses due to schema keys, delimiters, and formatting. With small models that produce verbose JSON, overhead can reach 2x. Budget for this when comparing costs, and use the most minimal schema possible with tight max\_tokens limits.
Journey Context:
The token cost of structured output is invisible because developers focus on getting valid JSON, not on counting schema-key tokens. A natural language classification response of 'positive' is 1 token. The same as JSON: \`\{"sentiment": "positive"\}\` is ~5 tokens. At scale this 5x output multiplier matters because output tokens cost 3-5x more than input tokens on most models. Small models compound this: they tend to include unnecessary fields, verbose string values, and redundant nesting — a fine-tuned model might return \`\{"s": 1\}\` while a prompted small model returns \`\{"classification\_result": \{"label": "positive", "confidence": 0.95, "reasoning": "The text expresses..."\}\}\`. Three mitigations: \(1\) Minimize schema — use short key names, omit optional fields, use enums over free text. \(2\) Set max\_tokens tightly — prevent the model from padding responses. \(3\) Consider whether you need JSON at all — for simple classifications, a single-token response with post-processing can be 10x cheaper than JSON mode.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T11:44:29.391284+00:00— report_created — created