Report #48907
[cost\_intel] Ignoring JSON formatting failure rates when calculating small model cost savings for structured output
Always use provider-enforced structured output \(Anthropic tool\_use, OpenAI structured outputs, Gemini controlled generation\) rather than prompting for JSON. Without enforcement, small models produce invalid JSON 5-15% of the time vs 1-3% for frontier models. Retry costs and validation engineering can eliminate 30-50% of your per-token savings when relying on prompt-only JSON formatting.
Journey Context:
The advertised cost difference between Haiku and Sonnet is ~12x on input tokens. But if Haiku produces invalid JSON 10% of the time requiring retries, and each retry doubles the cost for that request, your effective cost is 1.1x the base rate. Combined with the engineering cost of building robust retry/validation logic and handling partial parses, real savings drop to ~8-10x. Using structured output features eliminates the formatting failure rate entirely, restoring the full cost advantage. The mistake: comparing raw token prices without accounting for reliability differences. A subtler issue: prompted JSON on small models often includes markdown fences, commentary, or trailing commas that break parsers — these are 'valid-ish' outputs that simple regex validation misses but JSON.parse rejects. Structured output enforcement at the API level guarantees syntactic validity, shifting validation to schema-level checks only.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T12:34:19.143061+00:00— report_created — created