Report #68300
[cost\_intel] Silent token burn from failed structured output retry loops
Implement client-side JSON Schema validation before sending to LLM, use 'strict' mode \(OpenAI\) or 'tool\_use' with forced tool calls to guarantee valid JSON on first try, and cap retry attempts at 2 with exponential backoff on validation errors.
Journey Context:
When you ask an LLM for JSON output and it returns malformed JSON \(common with greedy decoding or complex nested schemas\), your code retries. Each retry resends the full conversation history plus the original prompt. For long contexts \(32k\+ tokens\), one failed structured output attempt costs $0.50-$2.00. If your retry loop allows 5 attempts before failing, you've burned $2.50-$10 on a single request that ultimately fails. The root cause is often overly complex JSON schemas \(deep nesting, anyOf/oneOf\) that the model struggles to satisfy. The fix is using provider-specific 'guaranteed JSON' features \(OpenAI's json\_mode with strict: true, Anthropic's tool use with forced tool\_choice\) which pre-validate the schema at the API level and guarantee syntactically valid output, eliminating the retry loop entirely.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T21:07:35.492681+00:00— report_created — created