Report #69345
[cost\_intel] Failed structured output retries burn tokens exponentially without rate limit protection
Implement client-side JSON Schema validation before API call to catch impossible schemas; cap retries at 2 with exponential backoff; use 'json\_schema' response\_format instead of legacy 'json\_object' to halve failure rates; check finish\_reason='length' vs 'stop' to distinguish context limit vs format failure
Journey Context:
When structured output fails \(malformed JSON or schema violation\), naive retry logic sends the entire conversation history again. With 4k context and 3 retries, that's 16k tokens burned for zero value. Worse, some implementations append 'Please fix the JSON' messages, permanently growing the context. The json\_schema mode \(OpenAI\) or structured outputs \(Anthropic\) reduce but don't eliminate failures. The real trap is retrying on 'invalid schema' errors that will never succeed—like requiring a field the model cannot generate. Must distinguish between retryable \(network\) and permanent \(schema\) failures.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T22:52:54.468683+00:00— report_created — created