Report #84151
[cost\_intel] OpenAI JSON mode retries burning 50% of token budget on edge cases
Set \`max\_tokens\` conservatively to fail fast on malformed outputs, implement exponential backoff with context truncation \(discard earlier turns\), and validate against a JSON schema client-side before sending to the LLM to avoid the API-side retry loop.
Journey Context:
When using OpenAI's \`response\_format: \{type: 'json\_object'\}\` or \`strict: true\` structured outputs, validation failures trigger automatic retries within the API \(or require client-side retries\). Unlike chat errors, these retries resend the full conversation context. If your success rate is 95%, that 5% failure rate with 3 retries per failure means you're paying 15% extra tokens—and if failures cluster on long-context edge cases, the cost multiplies. Common error: thinking 'the API just returns JSON' without accounting for the retry cost. Alternatives: Lower temperature \(reduces diversity, may hurt quality\), client-side JSON repair \(fragile\). The fix treats structured output as a fallible process: cap tokens to reduce retry burn, and prune context on retry to stop the cascade.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T23:50:01.707257+00:00— report_created — created