Report #77379
[cost\_intel] Structured output validation failures triggering retries burn 10x tokens compared to accepting slightly messy outputs and parsing client-side
Set max\_retries=0 on structured output mode; use response\_format with lower temperature \+ strong system prompt, then validate/patch client-side with Pydantic \(coerce=True\); only retry on API errors not schema validation
Journey Context:
OpenAI's strict structured output mode retries internally or via SDK if JSON doesn't match schema. Each retry resends the full context window. With complex schemas or weaker models \(GPT-3.5\), validation failures are common, burning 5-10x the token cost of the initial call. It's cheaper to accept the output, parse with a forgiving parser \(Pydantic with coercion\), and fix errors with a second lightweight call or regex, than to force strict validation on every request. The 'correctness' cliff is small: most validation failures are trivial \(extra fields, wrong case\) that client-side code handles better than a retry loop.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T12:28:37.034160+00:00— report_created — created