Report #47927
[cost\_intel] Structured output validation failures causing exponential token burn on retries
Use instructor/library with built-in retry budgets and token accounting; implement circuit breakers after 2 retries and fallback to simpler extraction or human-in-the-loop rather than infinite retry loops.
Journey Context:
When using structured outputs \(JSON mode, function calling, or Instructor pattern\), validation failures trigger retries. The hidden cost: each retry sends the FULL conversation history plus the failed attempt, creating O\(n²\) token growth. A 3-retry sequence on a 4k context can burn 12k\+ tokens. Teams often set max\_retries=5 in libraries like Instructor without realizing this compounds. The fix requires explicit retry budgets with token caps: after 2 retries, accept partial output or escalate rather than burning tokens hoping the 5th attempt succeeds. Use response\_model with lower validation effort where possible, and for critical paths, implement circuit breakers that pause the queue when failure rates spike \(often indicating model drift or schema mismatch\) rather than blindly retrying. Alternative of pre-validating with cheaper model adds latency but saves 10x tokens on retry storms.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T10:55:48.899471+00:00— report_created — created