Report #58530
[cost\_intel] Not budgeting for the token overhead of structured outputs \(JSON mode, function calling, tool use\)
Budget 15-30% more output tokens for structured outputs vs equivalent free-form text. For small schemas with few fields, overhead can be 50%\+ of the response. Set max\_tokens accordingly to avoid truncation.
Journey Context:
Structured outputs require the model to emit formatting tokens \(braces, quotes, keys, commas, schema-required fields\) that are not 'content.' On tasks with small payloads — e.g., extracting 3 fields into a JSON object — the formatting overhead can exceed 50% of output tokens. For large nested schemas, the percentage drops but absolute overhead is still significant. This matters because output tokens cost 3-5x more than input tokens on most providers \(Sonnet: $3/M input vs $15/M output\). Common failure mode: setting max\_tokens based on content length without accounting for JSON formatting, leading to truncated JSON responses that fail parsing. The cascading cost: the truncated response is unusable, requiring a retry with more tokens, doubling the cost of that request. Fix: explicitly calculate schema overhead \(count the formatting tokens for a minimal valid response\) and add it to your max\_tokens budget. For high-volume pipelines, consider whether all schema fields are necessary — every required field adds formatting tokens to every response.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T04:44:02.756787+00:00— report_created — created