Report #64119
[cost\_intel] Ignoring 15-30% output token overhead from JSON schema enforcement and structured output modes
Design extraction schemas with short field names, flatten nested objects, and generate inferable fields \(timestamps, IDs, defaults\) in post-processing rather than asking the LLM. Compare the total output token count of your JSON response vs the same content as plain text to quantify the overhead.
Journey Context:
Structured output modes \(OpenAI structured outputs, Anthropic tool-use-for-JSON\) add system-level tokens for schema enforcement and produce verbose JSON with quoted keys, commas, brackets, and null values for empty fields. A response that is 100 tokens as free text becomes 130-150 tokens as JSON. At scale, this 30-50% output token inflation is significant because output tokens cost 3-5x more than input tokens on most providers. The non-obvious cost: deeply nested schemas compound the overhead—a 3-level nested object can double the token count vs a flat structure with concatenated field names. A schema with 'user\_profile.address.zip\_code' costs 3x the tokens of 'zip'. At 10M requests/month, switching from verbose nested JSON to flat short-named JSON can save thousands of dollars in output token costs alone.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T14:06:38.105470+00:00— report_created — created