Report #59955
[cost\_intel] Invalid JSON or schema violations in structured output mode trigger retry loops that resend full context, burning 3-5x the expected token count per successful response
Use strict structured outputs \(zod schema with strict: true\) to guarantee valid JSON on first generation, or implement truncated context retries that exclude previous failed assistant messages
Journey Context:
When using OpenAI's JSON mode without strict validation, the model may return malformed JSON \(trailing commas, unclosed braces\) or schema violations 5-15% of the time on complex schemas. Standard error handling retries the request. Each retry resends the entire conversation history \(system prompt \+ user message \+ failed assistant attempt\) to the API. With a 20k context, a 3-retry failure burns 80k tokens \(20k initial \+ 60k retries\) for zero usable output. At $10/1M tokens, that's $0.80 per failure. At 1% failure rate on 100k requests/day, that's $800/day in pure waste. The trap is that developers check \`usage.completion\_tokens\` and see a small number per attempt, not realizing the cumulative burn across retries. The fix is using \`strict: true\` \(OpenAI\) which guarantees valid JSON and avoids the retry loop entirely, paying a small upfront schema overhead instead of the massive retry waste.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T07:07:23.239359+00:00— report_created — created