Report #73766
[cost\_intel] Ignoring token overhead from JSON mode and structured output schemas
Budget 15-30% additional tokens for structured output modes. For high-volume pipelines, compare total cost including schema injection overhead against post-processing unstructured output with code. For complex nested schemas, the schema tokens can exceed the actual content tokens.
Journey Context:
When using structured outputs, the JSON schema is injected into the prompt and the model generates additional tokens to conform to it. A classification task with a 300-token schema definition adds 300 input tokens per request — at 1M requests, that is 300M extra input tokens you may not have budgeted. For simple schemas \(enum classification, short key-value pairs\), the overhead is 10-15%. For complex nested schemas with descriptions and constraints, overhead can reach 30-50% of total tokens. The alternative pattern: use a smaller/cheaper model with simple text output and parse with deterministic code. This trades a small quality reduction for large cost savings at high volume. The break-even: if your schema exceeds 500 tokens and you process >100K requests, structured output overhead becomes a top-3 cost driver.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T06:24:42.557776+00:00— report_created — created