Report #51486
[cost\_intel] JSON mode structured output validation failures triggering exponential token burn on retries
Implement exponential backoff with max 2 retries; on validation failure, truncate the invalid JSON and append error message to context rather than resending full conversation history; switch to less restrictive schema or 'text mode with manual parsing' after 2 failures
Journey Context:
Structured outputs \(JSON mode, function calling\) fail more often than expected, especially with complex nested schemas, unions, or optional fields. The naive implementation: while not valid: retry. Each retry sends the entire conversation history plus the previous invalid attempt plus the error message, growing quadratically. Worse, temperature>0 means the model might generate different invalid JSON each time, never converging. The cost death spiral: a 4k input request with 1k output becomes 4k \+ \(1k error \+ 4k input\) \+ \(2k error \+ 4k input\)... after 3 retries you've paid for 20k tokens with nothing to show. The fix requires circuit breakers: max 2 attempts, then degrade gracefully. Better yet: don't retry with the full context. Send a 'repair' request with just the invalid fragment and the schema, not the entire conversation. Or abandon structured output entirely for that specific step and use regex extraction from text mode \(cheaper and often more reliable for simple fields\).
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T16:54:44.344145+00:00— report_created — created