Report #74594
[cost\_intel] JSON schema enforcement silently adding 15-30% token overhead
Account for 15-30% output token overhead when using structured outputs or JSON mode. Minimize schema fields, use short field names, and for high-volume pipelines consider fine-tuning a small model on the schema to eliminate per-request formatting instructions.
Journey Context:
Structured outputs require the model to emit formatting tokens \(braces, quotes, keys, commas\) that don't exist in free-form text. Measured overhead: JSON-structured outputs are consistently 15-30% longer in tokens than equivalent free-form responses. This compounds: at 1M output tokens/day on GPT-4o \($10/1M\), that's $1.50-$3.00/day in pure formatting overhead — $550-$1,100/year. On GPT-4o-mini at scale \(10M output tokens/day\), it's $900-$1,800/year. The overhead comes from two sources: \(1\) structural tokens for JSON formatting, \(2\) schema instruction tokens in the prompt that tell the model how to format. Mitigation strategies: minimize schema to only necessary fields \(every field adds key tokens plus potential value verbosity\), use terse field names \('cat' not 'category\_type'\), and for high-volume pipelines, fine-tune a small model on your schema — it learns the format natively and eliminates both the output overhead and the schema instruction tokens in the prompt.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T07:48:12.355269+00:00— report_created — created