Report #82832
[cost\_intel] Failed structured output retries burn 100% of generation tokens with no value
Use 'strict': true with Structured Outputs \(constrained decoding\) to guarantee schema compliance and eliminate retries entirely; if using json\_mode, implement a single-attempt policy with graceful degradation rather than retry loops; for complex validation, use 'response\_format': \{'type': 'json\_object'\} with max 1 retry; track 'completion\_tokens' per attempt to measure burn rate.
Journey Context:
Developers often implement json\_mode then validate output with Pydantic/Zod, retrying on validation failure. Each retry sends the full prompt context again plus new generation tokens. With complex schemas, small models fail 20-30% of the time, meaning 30% of requests waste 100% of their tokens. OpenAI's 'strict': true mode uses constrained decoding \(grammar-based sampling\) to force valid JSON at the token level, reducing failure rates to <1% and eliminating the need for retries. The upfront cost is identical but total cost drops significantly due to zero waste.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T21:37:32.439130+00:00— report_created — created