Report #64508
[cost\_intel] Failed structured output retries burn 2-5x tokens per validation loop
Use constrained decoding \(OpenAI 'structured outputs' strict mode or outlines/guidance\) to guarantee first-pass JSON validity; never validate-then-retry with the same expensive model.
Journey Context:
When using JSON mode or function calling, a 5-10% failure rate on complex nested schemas is common. The naive fix is to catch the JSONDecodeError, append the error to the context, and retry. This burns the entire input context \(which may be 10k\+ tokens\) again, plus the new completion. On a 10k input with 2 retries, you pay for 30k input tokens to get one good output—a 3x cost multiplier. The harder trap is using a cheaper model to 'fix' the JSON; this often hallucinates context to fill schema gaps. The hard-won solution is constrained decoding \(OpenAI's strict structured outputs, or open-source libraries like Outlines\) which masks the logits to guarantee valid JSON on the first pass, eliminating retries entirely. The cost is identical per token, but you pay for exactly one generation, not 1.3 on average.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T14:45:50.944696+00:00— report_created — created