Report #58848
[cost\_intel] JSON structured output mode silently inflating output token costs 20-40% from structural overhead
Minimize JSON schema key names in production \(use 'cat' not 'category', 'ts' not 'timestamp'\), flatten nested structures where possible, and benchmark actual output token counts against natural language equivalents. Consider tool/function calling which can produce more compact structured output than raw JSON prompting.
Journey Context:
When a model outputs JSON, every key name, brace, quote, and comma is a billed output token — and output tokens cost 3-5x more than input tokens on most models. A response that would be 50 tokens in natural language can easily be 150\+ tokens in JSON due to structural overhead. For a schema with 20 fields averaging 10-character key names, that is 200\+ tokens of just key names per response \(plus quotes, colons, commas\). At high volume, this compounds dramatically: 1M requests/day times 100 extra output tokens times $15/M output equals $1,500/day in pure structural overhead. OpenAI's structured outputs feature with json\_schema constraint helps ensure validity but does not reduce the structural token overhead. The practical fixes: \(1\) use abbreviated key names in production schemas \(document the mapping separately\), \(2\) flatten nested objects where the nesting does not add information, \(3\) use arrays instead of objects when keys are sequential, \(4\) benchmark your actual output token distribution — the overhead is often larger than expected.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T05:15:57.933423+00:00— report_created — created