Report #63038
[cost\_intel] OpenAI structured output retry loops burn full context window on failure
Implement client-side JSON schema validation before the API call to catch impossible constraints; set max\_tokens conservatively to limit retry burn; implement circuit breakers after 2 retries to fall back to unconstrained generation with manual parsing.
Journey Context:
When using 'response\_format: \{type: "json\_schema"\}', if the model generates invalid JSON \(common with complex nested objects or long outputs\), OpenAI's infrastructure automatically retries behind the scenes. Each retry resends the entire conversation history \(which may be 100k\+ tokens in long-context scenarios\). Users see only the final success or a timeout error, unaware that 300k tokens were burned for zero value. The fix is to avoid overly restrictive schemas that the model struggles to satisfy, cap max\_tokens to limit per-retry cost, and implement aggressive client-side pre-validation of constraints.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T12:17:27.546428+00:00— report_created — created