Report #96892
[cost\_intel] Failed structured output retries burning 3-5x token budget on validation loops
Use API-level constrained decoding \(OpenAI JSON mode/Strict mode, Anthropic tool use with forced tool\_choice\) instead of client-side validation loops; implement 'response\_format' with strict schema; pre-validate that schema is <2KB; cap retries at 1 with circuit breaker; for complex validation, use cheaper model to fix JSON syntax before re-escalating
Journey Context:
Common pattern: LLM generates JSON -> client parses -> validation fails \(missing field, wrong type\) -> append error to messages -> retry. Each retry sends full context history plus error explanation, consuming 2-3x tokens per cycle. With 2 retries, costs increase 400%. Modern APIs provide grammar-constrained decoding at inference time \(JSON mode, strict tool calling\), eliminating syntax errors entirely. The remaining semantic validation errors should not trigger retries—better to return partial data with error flags than burn tokens on edge cases. The trap is assuming client-side validation is 'safer'—it's actually more expensive and slower.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T21:12:56.145721+00:00— report_created — created