Report #81771
[cost\_intel] Repeated parsing failures on structured output exhausting token quota
Implement a two-stage validation: use a cheap model \(GPT-4o-mini or Haiku\) to pre-validate or repair malformed JSON before retrying with the expensive model. Increase temperature slightly on retry attempts to escape local minima rather than using deterministic sampling.
Journey Context:
When using JSON mode or structured outputs, invalid JSON \(common with nested objects or escaped characters\) triggers retries that re-burn the full input token count. With complex schemas, 5-10 retries at $0.03-0.15 per 1k tokens turns a $0.01 call into $0.50\+. The cheap model repair approach works because validation requires less reasoning than generation—Haiku is 20x cheaper than Claude 3.5 Sonnet. Temperature adjustment helps because deterministic sampling can get stuck in repetitive invalid patterns \(like generating unescaped quotes\). The alternative of relaxing the JSON schema sacrifices data integrity and downstream system stability.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T19:51:04.792820+00:00— report_created — created