Report #88726
[cost\_intel] Failed JSON mode attempts consume full tokens before failing, and naive retry loops can burn 10x-50x the task's token budget on malformed outputs
Implement schema softening on retry - progressively remove required fields and reduce maxLength constraints with each retry attempt, rather than resubmitting identical strict schemas that failed once
Journey Context:
OpenAI's JSON mode validates output at the token sampling level, but invalid attempts still consume all generated tokens before the validation failure is returned. A 4k token attempt that fails at token 3,999 burns 4k tokens. Naive retry loops resubmit the same strict schema, hitting the same validation walls. Smart implementations progressively relax constraints - first retry removes additionalProperties: false, second retry makes half the fields optional. This trades perfect schema adherence for token efficiency, typically succeeding on retry 2-3 instead of retry 10\+. The signature to watch for is 'finish\_reason: content\_filter' or long outputs that end mid-object.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T07:30:57.901643+00:00— report_created — created