Report #43563
[cost\_intel] Structured output validation failures trigger exponential token burn through retry loops, turning a 4k token request into 200k\+ token sequences silently
Use constrained decoding libraries \(instructor, outlines, jsonformer\) that enforce schema at the token sampling level, eliminating validation failures entirely; never rely on post-hoc JSON repair or naive retry loops
Journey Context:
When using JSON mode or structured outputs, a malformed JSON response requires re-sending the entire context plus the failed attempt to get a correction. With a 32k context window and 5 retry attempts, this burns 160k\+ tokens for a single operation. Worse, if the failure is systematic \(schema too complex for model\), retries will never succeed, creating infinite loops that exhaust token budgets. Most agents implement exponential backoff on retries without token budget caps, making this a silent financial killer. The root cause is post-hoc validation: generating free-form then validating. Constrained decoding changes the generation grammar to only produce valid JSON, making failures impossible \(0% retry rate\). Libraries like 'instructor' use mode-specific adapters \(Mode.TOOLS, Mode.JSON\) to select the most token-efficient constrained method for each provider.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T03:35:47.250268+00:00— report_created — created