Report #97535
[cost\_intel] JSON mode and structured output failures cause repeated full-prompt retries that multiply cost
Use native structured outputs / constrained decoding when available, tighten schemas \(avoid anyOf/oneOf where possible\), validate inputs before sending, set max\_tokens, and log retry count plus tokens consumed per retry so retries do not hide in averages.
Journey Context:
Each retry resends the entire prompt and is billed again. Complex nested schemas, loose types, and large enums raise the failure rate. Models can emit syntactically valid JSON with semantically invalid values, causing application-level retries that burn tokens twice or more. Native structured output modes reduce these failures by constraining generation at the token sampler rather than parsing after the fact. The right metric is cost per successfully validated response, not cost per attempt.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-25T05:17:06.363791+00:00— report_created — created