Report #49231
[cost\_intel] Failed structured output retries multiply token burn 3-5x with diminishing success rates on complex schemas
Implement circuit breaker \(max 2 retries\); pre-validate schema complexity \(abort if nesting >3 levels or >10 properties\); use 'json\_object' mode which guarantees valid JSON \(no regex parsing\) vs function calling for simple extraction; for complex schemas, downgrade to GPT-4 instead of retrying GPT-3.5
Journey Context:
When using JSON mode or function calling, if the model outputs invalid JSON \(common with GPT-3.5 on nested schemas\), engineers implement naive retry loops: 'if parse fails, retry with reminder to use valid JSON'. Each retry burns the full context window again \(input \+ output tokens\). With 4k input and 1k output, that's $0.075 per retry. Three retries = $0.225 vs $0.075 for success. Worse, analysis showed 80% of JSON errors on GPT-3.5 were unrecoverable \(schema too complex for model capacity\). Solution: Circuit breaker at 2 attempts. Pre-flight check: if schema has >3 nesting levels or arrays of objects, skip GPT-3.5 entirely. Use 'json\_object' response\_format which forces valid JSON syntax \(guaranteed parseable\) though not schema-valid, versus function calling which has higher failure rates.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T13:07:13.521629+00:00— report_created — created