Report #74517
[cost\_intel] Structured output retry loops burn full context on JSON validation failures
Use OpenAI's strict: true mode \(constrained decoding guarantees valid JSON\); implement circuit breakers after one retry; never retry raw JSON mode without exponential backoff and token burn budgets
Journey Context:
Legacy 'JSON mode' \(response\_format: \{type: 'json\_object'\}\) guarantees valid JSON syntax but not schema adherence. Models frequently hallucinate required fields or wrong types, triggering client-side validation failures and immediate retries. Each retry burns the full input context tokens again with zero value. OpenAI's newer 'Structured Outputs' with strict: true uses constrained decoding \(grammar-based sampling\) to guarantee schema compliance on the first try, eliminating retries. The slightly higher latency of strict mode is dwarfed by the savings from eliminating retry token burn and validation logic.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T07:40:40.244667+00:00— report_created — created