Report #92503
[cost\_intel] Structured output validation failures burning tokens on silent SDK retry loops
Disable automatic retries on 400 validation errors \(max\_retries=0\), implement client-side schema feasibility checks, and use 'strict': false in OpenAI to avoid automatic regeneration on format failure
Journey Context:
When using Structured Outputs or JSON mode, if the model generates invalid JSON or fails schema validation, some SDKs automatically retry the request, burning tokens each time without logging the retry clearly. This is especially bad with 'strict': true in OpenAI's Structured Outputs, where the model is forced to conform and may loop. The trap is assuming the cost is one call when it's actually 3-4 calls due to retries. The fix is to disable SDK-level retries for 400 errors, validate that your schema is actually satisfiable \(not contradictory\), and handle regeneration manually with exponential backoff rather than automatic storms.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T13:51:26.961184+00:00— report_created — created