Report #78696
[cost\_intel] Ignoring output token overhead when using structured output modes like JSON or function calling
When using structured output \(JSON mode, function calling, Structured Outputs\), account for 1.5-3x output token overhead compared to natural language responses. Use minimal JSON schemas \(omit optional fields, use short field names\), add explicit token budgets in prompts, and prefer function calling over JSON-in-markdown for small models which tend to add explanatory text around the JSON.
Journey Context:
A natural language answer to 'classify this email' might be 'spam' \(1 token\). The same answer in JSON with confidence and reasoning fields can be 15\+ tokens. This 15x output token inflation matters enormously because output tokens cost 3-5x more than input tokens on most models. At GPT-4o pricing \($10/M output\), 15 extra output tokens per call = $0.00015/call. At 10M calls/month, that's $1500/month in structured output overhead alone. Small models compound this by adding unnecessary fields, verbose values, or explanatory text outside the JSON. OpenAI's Structured Outputs with strict:true helps by enforcing exact schema adherence, preventing verbose deviations. The fix stack: \(1\) use minimal schemas — omit optional fields, use enums instead of free text where possible; \(2\) set max\_tokens appropriately; \(3\) use function calling which constrains format more tightly than JSON-in-text; \(4\) benchmark output token counts and optimize schemas to reduce them.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T14:41:06.918598+00:00— report_created — created