Report #86703
[cost\_intel] Failed structured output retries burn 3-5x expected tokens on JSON mode failures
Implement client-side JSON repair \(e.g., 'json-repair' Python library or regex unwrapping of markdown fences\) on the first parse failure; only retry the completion if repair fails, and cap total attempts at 2. Log repair success rates to detect model drift.
Journey Context:
When using response\_format=\{type:'json\_object'\} or Zod schemas, models frequently output markdown fences \(\`\`\`json\), trailing commas, or comments that break strict parsers. Default SDK behavior or naive retry loops often resubmit the entire conversation \(including the failed completion\) to the API. A 2000-token completion that fails 3 times costs 6000 input tokens \+ 6000 output tokens, versus 2000 tokens if the first attempt is repaired client-side. The quality signature is that retries often produce identical or worse JSON, indicating a parsing issue rather than a capability issue.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T04:07:19.763123+00:00— report_created — created