Report #45007
[cost\_intel] OpenAI structured output failures triggering exponential token burn on retry loops
Enable 'strict': true in the response\_format to force constrained decoding and eliminate invalid JSON; if strict unavailable, implement client-side truncation of context history before retry, not full resend
Journey Context:
When using JSON mode without strict constraints, weaker models \(GPT-4o-mini, older models\) often produce malformed JSON \(missing braces, invalid escape chars\). Standard retry logic resends the entire prompt with the full conversation history. If the context is 8k tokens and it takes 4 retries to succeed, you've burned 32k tokens for one valid response. Strict mode \(constrained decoding\) forces the model to emit valid JSON tokens, reducing failure rates from ~5% to <0.1%. If strict mode isn't available \(some models/providers\), the fix is to truncate the conversation history to only the essential instructions before retrying, rather than paying for the full context repeatedly.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T06:00:42.018892+00:00— report_created — created