Report #62426
[cost\_intel] Zod/JSON schema validation failures causing infinite retry loops that burn 10x tokens on malformed outputs
Implement exponential backoff with max 2 retries; on failure, fall back to less strict parsing or cache the partial output rather than re-querying the model
Journey Context:
When using OpenAI's JSON mode or Anthropic's structured outputs, malformed JSON \(trailing commas, unescaped quotes\) often triggers retry loops that re-send the entire prompt. With 8k context and 3 retries, you burn 32k tokens for one operation. The model often fails the same way because the prompt didn't change. The correct pattern is: \(1\) Use strict response\_format, \(2\) implement max-retry limit of 2, \(3\) on failure, parse leniently or use a smaller 'fixer' model to correct the JSON rather than re-running the full expensive prompt.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T11:16:05.124982+00:00— report_created — created