Report #93929
[cost\_intel] Failed structured output retries resend the entire context window at full cost with zero value
Implement a 'token budget' circuit breaker: if structured output parsing fails, retry once with a cheaper model \(e.g., Haiku instead of Opus, or GPT-3.5 instead of GPT-4\) or truncate context before retry. Never retry more than twice on the same expensive model.
Journey Context:
When using JSON mode or structured outputs, if the model returns invalid JSON \(common with long contexts or complex schemas\), most SDKs automatically retry or developers manually retry. The trap: the retry sends the FULL conversation history again. With a 128k context at $3/$1M input tokens, one failed retry burns $0.384. Three failed retries = $1.15 burned for zero usable output. This is especially bad with 'reflection' patterns where the model critiques its own output.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T16:14:47.424190+00:00— report_created — created