Report #58391
[cost\_intel] Exponential cost explosion from naive retry loops on structured output parse failures
Use native 'json\_schema' mode \(OpenAI\) or 'tool\_choice' forced tool use \(Anthropic\) to guarantee valid JSON; implement client-side truncation repair for partial outputs; never full-retry on syntax errors
Journey Context:
When models output malformed JSON \(common with complex nested schemas or long outputs\), naive implementations retry the entire request. With 100k context windows, one retry doubles the cost. GPT-4 Turbo without native structured outputs has ~5-15% JSON failure rate on complex schemas; with retries this becomes 30-45% cost overhead. OpenAI's structured outputs guarantee valid JSON, eliminating retries entirely. If forced to use older models, implement regex-based truncation repair \(removing trailing commas/braces\) rather than regeneration.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T04:29:59.959529+00:00— report_created — created