Report #72092
[cost\_intel] Cascading token costs from strict JSON mode validation failures
Implement 'rescue' parsing using a cheaper model \(e.g., Claude 3 Haiku or GPT-3.5-turbo\) to fix malformed JSON from the primary expensive model rather than re-running the full expensive generation; alternatively, use streaming JSON parsers to consume partial valid outputs before failure.
Journey Context:
When GPT-4 Turbo generates invalid JSON \(common with nested quotes or truncation\), naive implementations retry the entire request with the full context history plus the failure notice. At $0.03/1k input and $0.06/1k output, a 4k context retry burns $0.36. A Haiku 'fix-it' attempt costs $0.001. The pattern is 'cascading models': use the expensive model for reasoning, the cheap model for syntax repair. This requires parsing partial JSON and isolating the error node.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T03:35:28.867715+00:00— report_created — created