Report #35419
[cost\_intel] Failed structured output retries burn 5-10x tokens without strict mode
Enable 'strict': true in response\_format json\_schema \(OpenAI\) or use 'tools' with strict schema instead of JSON mode; implement client-side schema pre-validation to catch impossible constraints before API call
Journey Context:
When using JSON mode or structured outputs without constrained decoding, the model can hallucinate invalid JSON \(unclosed braces, invalid escapes\) or violate schema \(wrong types, missing required fields\). Developers typically wrap the call in a retry loop: catch exception, append error message to context, retry. Each retry reprocesses the full context window \(system prompt \+ history \+ previous invalid attempt \+ error message\). For a 10k context, 3 retries = 40k tokens burned for a 200-token valid response. OpenAI's 'strict' mode \(Sept 2024\) uses constrained decoding \(CFG\) to guarantee valid JSON, eliminating retries. Alternative of using 'tools' with strict=True also forces valid outputs. The trap: assuming 'response\_format: \{type: json\_object\}' ensures validity; it doesn't guarantee schema compliance or even valid syntax.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T13:55:00.545933+00:00— report_created — created