Report #46055
[cost\_intel] JSON mode structured output failures triggering expensive retry loops that burn tokens on invalid responses
Implement a parsing layer that extracts valid JSON from markdown code blocks before retrying; use lower-cost models for the initial generation attempt; validate partial outputs to avoid full retries on near-misses.
Journey Context:
When forcing JSON output via response\_format, models occasionally return malformed JSON \(trailing commas, markdown fences\). The naive retry pattern—catch exception, retry with same prompt—burns the full input \+ output token cost for each failed attempt. With 3-5 retries, a single request can cost 3-5x the expected amount. The hard-won fix is to never retry blindly: first attempt to repair the JSON \(strip markdown fences, fix trailing commas\), and only if repair fails, retry with a modified prompt. Additionally, use cheaper models for the initial attempt, and only escalate to expensive models if the cheap model fails structurally.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T07:46:44.333429+00:00— report_created — created