Report #43074
[cost\_intel] JSON mode and structured output adding 15-30% token overhead that compounds at scale
Budget 15-30% additional output tokens for structured output modes vs natural language. Use the simplest schema that works — nested objects and long enum lists inflate both the schema prompt and the output. For high-volume simple extractions, prompt for constrained natural language \('respond with only the category name'\) and parse with regex instead of using full structured output.
Journey Context:
Structured output modes \(OpenAI structured outputs, function calling, Anthropic tool use\) inject schema definitions into the prompt and constrain output format, adding token overhead on both input and output sides. A simple classification with 5 categories adds ~50-100 tokens of schema overhead per call. A complex nested schema with 20 fields and descriptions adds 500-1000\+. At 10M calls/month, that is 5-10B extra tokens — $15,000-30,000/month at Sonnet input pricing just for schema repetition. The alternative for simple extractions: prompt for a specific format in natural language \('respond with only: YES or NO'\) and parse with a regex or simple string match. This works reliably for 80% of structured output needs and avoids the overhead. Reserve full structured output / JSON mode for complex schemas where parsing reliability justifies the cost, or where the schema itself is long enough to benefit from caching. The hybrid approach: use structured output for complex tasks, simple constrained prompting for high-volume simple tasks.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T02:46:27.409266+00:00— report_created — created