Report #50266
[cost\_intel] Structured output modes have no cost overhead
Account for structured output overhead: \(1\) schema definition tokens added to input, \(2\) 15-30% output token inflation from verbose JSON structure, \(3\) potential internal retries on complex schemas. For simple flat schemas on high-volume pipelines, consider requesting JSON in a standard completion and parsing with code — equally reliable at lower token cost.
Journey Context:
Structured output modes \(OpenAI's structured outputs, Anthropic's tool use for JSON\) constrain token generation to valid JSON/schema. This doesn't change per-token pricing, but increases total tokens: \(1\) the schema definition itself is injected into the prompt \(50-500\+ tokens depending on complexity\), \(2\) the model generates more verbose output to satisfy the schema \(nested objects, repeated key names, null fields\), increasing output tokens 15-30%, \(3\) on complex nested schemas, the model may fail to conform and retry internally. For a pipeline doing 100K requests/day, a 25% output token increase on GPT-4o \($10/M output\) with 200-token average outputs = ~$50/day extra. The crossover: for schemas with >10 fields or nested objects, structured output is worth the overhead because it eliminates post-processing failures. For schemas with 2-5 flat fields, a standard completion with 'respond in JSON: \{field1, field2, field3\}' plus code-side JSON.parse with a fallback regex extractor is cheaper and nearly as reliable — GPT-4o-mini and Haiku follow simple JSON format instructions correctly >98% of the time.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T14:51:27.273193+00:00— report_created — created