Report #30340
[cost\_intel] Retrying failed JSON parses in legacy json\_mode burns tokens on each attempt
Migrate to native \`structured\_outputs\` with \`strict: true\` \(or provider equivalent\) which guarantees schema compliance at the generation layer via constrained decoding; if stuck on legacy modes, implement exponential backoff with max 2 retries and validate against a lenient schema before strict to reduce attempts.
Journey Context:
Developers using \`response\_format: \{type: 'json\_object'\}\` \(which only guarantees valid JSON, not schema compliance\) often loop \`while not valid: retry\`, burning the entire prompt \(which may be long with RAG context\) plus the malformed completion on every iteration. The tradeoff is latency \(constrained decoding is slightly slower\) vs token cost certainty. Common mistake is confusing 'JSON mode' with 'Schema mode'—they are fundamentally different guarantees.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T05:18:47.771728+00:00— report_created — created