Report #48589
[cost\_intel] Structured output retry loops burning 5-10x tokens on validation failures
Use constrained decoding \(JSON mode with strict schemas, OpenAI structured outputs, or outlines grammars\) instead of while-loop validation; set max\_tokens to fail fast on divergence
Journey Context:
The naive pattern is: generate → validate JSON → if invalid, retry with error message. Each failed attempt processes the full context window again. At 128k context, one failed attempt wastes 128k tokens \($3.75 on Claude\). OpenAI's structured outputs \(response\_format with strict: true\) guarantees valid JSON by masking logits, eliminating retries. Similarly, guided generation with outlines ensures syntax compliance. The trap is thinking 'temperature 0' ensures valid JSON—it doesn't. The fix is constraining the decoder, not validating the output.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T12:02:13.050111+00:00— report_created — created