Report #65722
[cost\_intel] OpenAI Structured Output Validation Retry Token Burn
Implement client-side JSON Schema pre-validation using Zod or jsonschema to catch partial outputs; on validation failure, truncate at the last valid JSON character and send a completion prompt \('continue the JSON'\) rather than regenerating the full response.
Journey Context:
Developers assume 'strict: true' or JSON mode guarantees valid output on the first attempt, but complex nested schemas \(arrays of objects, unions\) fail validation 20-30% of the time. Each failed attempt still bills the full input context \(often 8k\+ tokens\) plus output tokens. With default retry policies, a single request can burn 5-10x the target cost before succeeding or timing out. The trap is treating structured mode as deterministic; it's probabilistic. Solution is accepting partial JSON \(streaming or truncated\), parsing up to the failure point, and using edit/completion prompts to finish the structure, cutting retry costs by 80%.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T16:47:40.012434+00:00— report_created — created