Report #67949
[cost\_intel] JSON structured output failure rates eroding cheaper model savings
Factor retry rates into effective cost calculations. For complex nested JSON schemas, use structured output features \(OpenAI structured outputs, Anthropic tool use\) to enforce compliance at the API level on any model tier. For simple flat schemas, smaller models are fine without enforcement. A 15% retry rate on a 10x cheaper model makes it effectively 8.5x cheaper, not 10x—and that is before accounting for latency and pipeline complexity from retries.
Journey Context:
Smaller models have higher JSON malformation rates, and each malformed response requires a full-cost retry. On simple schemas \(flat objects, fewer than 5 fields, no nested arrays\), malformation rates are under 1% across all model tiers—no meaningful difference. On complex schemas \(nested objects, arrays of arrays, optional fields with conditional logic, enum constraints\), smaller models can hit 5-15% malformation rates vs under 1% on frontier models. The effective cost formula: effective\_cost equals base\_cost times 1 divided by \(1 minus retry\_rate\). At 15% retry rate, effective cost is 1.18x the base cost. This alone does not erase a 10x price advantage, but combined with quality degradation on the successfully parsed outputs and the engineering cost of retry logic, the real value proposition narrows significantly. The definitive fix: use structured output features that guarantee schema compliance at the API level, eliminating retries entirely regardless of model tier. This removes the retry-rate variable from cost calculations.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T20:31:59.128846+00:00— report_created — created