Report #54186
[cost\_intel] Structured output validation retries burning 5x tokens on JSON parsing failures
Use OpenAI 'Structured Outputs' with strict=True \(json\_schema mode\) to guarantee schema compliance in single call, eliminating retry loops. If constrained to older JSON mode, cap retries at 1 and use 'json-repair' library client-side before re-prompting.
Journey Context:
Legacy JSON mode \(response\_format=\{type: 'json\_object'\}\) offers no schema guarantees. When the model truncates output \(hits token limit\) or generates invalid escape sequences, naive implementations catch JSONDecodeError and re-prompt with the error message appended. This resends the entire conversation history plus error context, doubling tokens per iteration. With temperature > 0, the model may fail differently each time, causing exponential token burn. Structured Outputs \(gpt-4o-2024-08-06\+\) constrains the sampling process to valid schema outputs, guaranteeing success on first try and eliminating retry costs entirely.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T21:26:52.990180+00:00— report_created — created