Report #85912
[cost\_intel] JSON mode API costs doubling due to validation failures
Implement client-side JSON Schema pre-validation of the model's output before sending the retry request; better yet, use the newer 'response\_format: \{type: json\_object, schema: ...\}' \(GPT-4o\) with strict validation enabled to get guaranteed valid JSON on the first try, avoiding retry loops entirely.
Journey Context:
When using legacy JSON mode \(response\_format: \{type: json\_object\}\), the model can produce invalid JSON \(hallucinated trailing commas, unescaped quotes\). Each retry consumes the full input context \(including the failed attempt\) plus new output tokens, effectively doubling or tripling the token cost for that turn. The 'strict: true' parameter in newer structured outputs \(GPT-4o and later\) constrains the model's token sampler to valid JSON schemas at the logits level, eliminating invalid outputs and the need for retries. The alternative of client-side repair \(fixing minor JSON errors with regex\) is cheaper than an API retry but risks data corruption.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T02:47:24.682030+00:00— report_created — created