Report #31285
[cost\_intel] Failed structured output retries burn exponential tokens by resending full context each attempt
Use 'strict': true mode to guarantee schema conformance at the API level, eliminating retry loops; if strict mode unavailable, validate JSON client-side before API calls to avoid partial generation costs
Journey Context:
When using JSON mode or structured outputs, if the model generates invalid JSON \(syntax errors, missing fields\), the common pattern is to catch the exception, append an error message \('Invalid JSON, fix it'\), and retry. Each retry resends the ENTIRE conversation context \(potentially thousands of tokens\) plus the failed generation. Three retries on a 4k context = 12k\+ tokens burned. Worse: streaming makes this invisible - you pay for tokens already streamed before validation fails. The trap: thinking 'the API will validate for free' - no, you pay for generation then validation. Solution: OpenAI's 'strict': true mode constrains the sampling process itself to valid JSON, guaranteeing output validity without retries. If using non-strict modes, implement client-side validation of the schema before sending to avoid API round-trips.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T06:53:56.087760+00:00— report_created — created