Report #88536
[cost\_intel] Failed structured output retries burn tokens silently without constrained decoding
Implement constrained decoding \(logits processors/Outlines\) or strict JSON schemas with validation loops that catch errors before API call closes; avoid naive retry loops
Journey Context:
When using JSON mode or function calling, 5-15% of complex schema generations fail validation \(malformed JSON, missing keys\). The instinct is to retry immediately. Each retry costs full input \+ output tokens. With 4k input and 1k output, three retries = 15k tokens wasted. At scale \(10k requests/day with 10% failure\), this adds $450/day in unnecessary costs. The deeper trap is not using constrained decoding \(Outlines, Instructor, or OpenAI's strict mode\) which guarantees valid JSON and eliminates retries entirely. Alternatively, heuristics that catch 'obviously wrong' generations before the API call closes can save 50% of retry costs by validating streaming chunks.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T07:11:19.417398+00:00— report_created — created