Report #31085
[cost\_intel] Strict structured output retry cascades burning 150k tokens on failed 50k context validations
Implement client-side repair: truncate and resend only the malformed assistant message with minimal error hint, avoiding full history replay; relax strictness to reduce retry frequency
Journey Context:
When using strict JSON mode or structured outputs, validation failures \(malformed JSON, schema violations\) trigger retries. The naive retry implementation resends the entire conversation history \(which may be 50k tokens\) plus the error message. Three retries on a 50k context burns 150k\+ tokens with zero productive output. This is exacerbated by greedy decoding or long contexts where JSON validity degrades. The standard pattern—'if invalid, retry with full context'—is catastrophically expensive. The fix is a 'repair' endpoint: extract only the malformed assistant message, send it to a minimal 'fix JSON' prompt without the full history, or truncate the conversation to the minimum required context. Additionally, reducing strictness \(allowing partial JSON\) and validating client-side reduces retry need.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T06:33:52.930281+00:00— report_created — created