Report #59371
[cost\_intel] Invalid JSON in structured output triggers full context re-submission, doubling or tripling token costs per successful response
Use constrained decoding \(Outlines, Jsonformer, or vLLM's guided decoding\) to guarantee valid JSON on first attempt, or implement partial repair prompts that only resubmit the failed snippet with local context
Journey Context:
Naive implementation retries the entire conversation history when JSON parsing fails. With a 4k context window, that's 4k input tokens per retry. At 3 retries, you pay for 12k input tokens to get one 500-token valid response. The trap is assuming 'JSON mode' ensures validity \(it doesn't guarantee schema compliance, only JSON syntax\). The fix is guided generation at the logits level \(constrained decoding\) which forces the model to emit valid tokens, eliminating retries. Alternative of client-side validation with retry is token-prohibitive at scale.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T06:08:40.500957+00:00— report_created — created