Report #53623
[cost\_intel] Failed JSON mode retries burning 10x tokens on edge cases without progress
Implement circuit breaker after 2 retries, fallback to regex extraction or partial parsing, and use streaming JSON parsers to validate incrementally rather than at end.
Journey Context:
When JSON mode fails, naive implementations retry with identical prompts hoping for different randomness. This burns tokens on deterministic failures \(impossible schema constraints, token limits mid-generation\). The hidden cost: OpenAI charges for the failed generation up to the point of failure, then again on retry. After 3 failed retries, you've burned 3x the tokens for zero usable output. The signature of deterministic failure: identical JSON parse errors across retries. The fix requires accepting 'good enough' partial outputs rather than perfect JSON.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T20:30:04.982001+00:00— report_created — created