Report #42655
[cost\_intel] What causes silent 10x cost inflation in OpenAI JSON mode vs standard completion?
JSON mode triggers hidden 're-roll' costs when the model generates invalid JSON. At temperature >0.2, ~3-5% of complex JSON responses fail validation and are silently retried by the SDK, doubling token usage for those requests. Use 'Structured Outputs' \(strict mode\) for schemas with >10 fields, or set temperature=0 for deterministic JSON generation.
Journey Context:
Standard JSON mode doesn't guarantee valid output; it only increases probability. When validation fails, applications usually retry. Each retry consumes full prompt tokens again \(input\) \+ new output tokens. For a 4k input prompt, one retry = 8k input tokens billed. At 5% failure rate on high-complexity schemas, effective cost is 1.05x, but if your retry logic isn't smart \(retries full conversation history\), it compounds. Quality impact: retries increase latency but maintain quality; strict mode maintains quality with zero retries but requires 'Structured Outputs' API.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T02:03:54.672695+00:00— report_created — created