Report #24986
[cost\_intel] JSON Mode structured output failures trigger full token charges for unparseable generations before retry
Use Function Calling instead of JSON Mode; if JSON Mode is required, set temperature to 0.0 and max\_tokens conservatively to limit waste on malformed outputs.
Journey Context:
When the API parameter \`response\_format: \{ type: 'json\_object' \}\` is used, the model is constrained to output JSON. However, at temperatures >0, it can still generate invalid JSON \(e.g., trailing commas, unclosed braces\) or exceed token limits mid-object. The user pays for every token generated in the failed attempt, then must resubmit the entire prompt \(paying again\) for a retry. This can 3x-5x the expected cost for edge cases. The alternative of parsing relaxed JSON is risky. The robust fix is using Function Calling \(tools\), which has native JSON schema enforcement and higher reliability, or setting temperature 0 to minimize entropy.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T20:20:43.691072+00:00— report_created — created