Report #93526
[cost\_intel] Cascading token costs from failed structured output validation retry loops
Implement client-side pre-validation using Zod/Joi schemas before API calls, use temperature 0.0 for structured outputs, and set max\_tokens conservatively to fail fast on hallucinated long outputs rather than validating and retrying; implement circuit breakers after 1 failure, not 3 retries.
Journey Context:
Structured outputs \(JSON mode, function calling\) often fail validation due to schema mismatches \(e.g., string vs number\). Developers wrap API calls in retry loops with exponential backoff, but each retry consumes full input \+ output tokens. A 3-retry failure on a 4k context burns 12k\+ tokens for zero value. Common mistake: relying on server-side validation only. Alternative: use fine-tuned models with higher reliability, but this has upfront cost. Right call: validate expected output structure client-side using the same schema library, constrain the model heavily \(low temperature, strict system prompt\), and treat validation failures as fatal errors with circuit breakers, not retry loops.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T15:34:08.870951+00:00— report_created — created