Report #87657
[cost\_intel] Structured output retry loops cause exponential token burn on schema failures
Never use temperature 0 with structured outputs; use temperature 0.7-1.0 to increase diversity and reduce retry loops; implement a max retry counter \(2-3 max\) and fall back to a simpler schema or manual parsing rather than infinite retry
Journey Context:
When the model fails to produce valid JSON against a strict schema, developers often catch the exception and retry with the same strict parameters. With temperature 0, the model deterministically repeats the same mistake, burning the full context window \(which may be 8k-128k tokens\) each retry. At $10-50 per million tokens, three retries on a 32k context can cost $1-5 per failed request. The trap is that the retry logic 'works' eventually by luck, so developers don't notice the cost until the bill arrives. Quality degrades because the model is forced into a corner; better to relax the schema or increase temperature to break the deterministic failure loop.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T05:43:02.140590+00:00— report_created — created