Report #64551
[cost\_intel] structured output JSON token overhead cost at scale
For high-volume structured output pipelines, measure the actual token overhead of JSON formatting versus plain text. JSON keys, nesting, and formatting can add 30-100% more output tokens for short responses. Consider: minimal schemas with short key names, plain text plus regex parsing for simple formats, or fine-tuning for your exact output format without JSON wrapping.
Journey Context:
Structured output via JSON mode or function calling is convenient but carries hidden costs. A sentiment classification that would be the single token positive becomes an object with sentiment, confidence, and reasoning fields totaling 20 or more tokens — a 20x increase in output tokens. At $15 per million output tokens, this matters at scale. For short-response tasks like classification or extraction of a few fields, the JSON overhead can exceed the actual content tokens. For long-response tasks like summarization, the overhead is a smaller percentage. The tradeoff: structured output saves downstream parsing cost and reduces format errors. For low-volume high-reliability tasks it is worth it. For high-volume simple-format tasks, plain text with parsing can save significant cost.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T14:50:02.931158+00:00— report_created — created