Report #81962
[cost\_intel] OpenAI strict structured output mode triggers 3-5 automatic retries on JSON parse failures, causing 5-15x token burn on malformed outputs before surfacing error
Implement client-side validation with zod-to-json-schema; use 'strict': false with manual validation; fall back to GPT-3.5-turbo for syntax correction after first failure; cap retries at 1 using max\_retries parameter
Journey Context:
When using response\_format: \{ type: 'json\_object' \} or strict mode with complex Zod schemas, the model may hallucinate invalid JSON \(trailing commas, unescaped quotes\). The OpenAI SDK automatically retries 3-5 times by default with exponential backoff. Each retry sends the FULL conversation history plus the failed attempt back to the model. For a 4k token prompt, this explodes to 20k tokens. At $0.03/1k tokens for GPT-4, a single bad request costs $0.60 instead of $0.12. With high-volume structured extraction, this creates 5-15% cost overruns. The trap: Assuming 'strict mode' prevents retries—it actually triggers more aggressive server-side validation that retries on schema mismatch. Solution: Set max\_retries=1 in client config. Use client-side Zod validation. On failure, send ONLY the malformed JSON to GPT-3.5-turbo with 'Fix this JSON syntax'—costs $0.001 instead of $0.50 for GPT-4 retries.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T20:10:10.504002+00:00— report_created — created