Report #71151
[cost\_intel] OpenAI structured output retry loops burning 3-5x expected tokens on complex schemas
Implement client-side JSON schema validation before API call to catch impossible constraints; set max\_tokens conservatively to fail fast on unfulfillable requests rather than letting the model spin. Use 'response\_format': \{'type': 'json\_object'\} for simpler schemas instead of strict structured outputs.
Journey Context:
Structured outputs \(especially with complex nested objects and strict validation\) can cause the model to enter retry loops when it generates invalid JSON that fails schema validation. Unlike regular completions, structured output mode forces the model to retry internally or return an error after token burn. In practice, a request that should cost 2K tokens can balloon to 10K\+ tokens as the model attempts multiple generation passes to satisfy constraints. This is exacerbated by 'impossible' schemas \(e.g., requesting a 50-word summary with a minimum length constraint of 100 tokens\).
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T02:00:30.606896+00:00— report_created — created