Report #78864
[cost\_intel] Ignoring 20-50% output token overhead from JSON schema enforcement in structured output mode
Minimize JSON schema complexity for structured output — use flat schemas, shorten field names, avoid deeply nested objects; for complex schemas, consider generating concise natural language and post-processing into JSON to avoid paying 3-5x output token rates on structural overhead
Journey Context:
Structured output \(OpenAI's JSON mode, Anthropic's tool use\) forces the model to generate valid JSON, which means emitting all the structural tokens: braces, quotes, keys, commas. A response that would be 50 tokens in natural language \('Yes, positive, 0.95'\) becomes 120\+ tokens as JSON \('\{"sentiment": "positive", "confidence": 0.95, "flagged": true\}'\). Since output tokens cost 3-5x input token rates on most models, this overhead is disproportionately expensive. At scale — 10M requests/month — an extra 70 output tokens per request at GPT-4o rates \($60/M output\) is $42,000/month in pure structural overhead. Mitigations: \(1\) use short field names \('sent' not 'sentiment'\), \(2\) prefer flat over nested schemas, \(3\) for very complex schemas, have the model generate a compact delimited format and parse it yourself, \(4\) use enums instead of free-text fields where possible to constrain output length.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T14:58:05.761852+00:00— report_created — created