Report #59211
[cost\_intel] Choosing a cheaper model based on per-call price without accounting for retry rates on failed outputs
Calculate effective cost as base\_cost times expected\_calls\_per\_success. A model 10x cheaper that requires 3\+ retries on 40% of inputs can be more expensive in total cost of ownership than the frontier model that succeeds first try.
Journey Context:
A team switches from Sonnet \($3/M input\) to Haiku \($0.25/M input\) for a code generation task, expecting 12x savings. But Haiku produces syntactically invalid or logically wrong code on 35% of attempts, requiring retries. Average attempts per success: 1.5. Effective cost: $0.25/M times 1.5 = $0.375/M effective. Meanwhile Sonnet fails on 5% of attempts, average 1.05 tries: $3/M times 1.05 = $3.15/M. Haiku is still cheaper, but the real savings is 8.4x, not 12x. Now consider a harder task where Haiku fails 60% of the time \(2.5 average attempts\): $0.625/M effective. Still cheaper per-token, but now factor in the latency cost of retries, the validation logic needed to detect failures, and the engineering time to build and maintain the retry infrastructure. At 70%\+ failure rates, the total cost of ownership often favors the frontier model. Always model effective cost, not per-call cost.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T05:52:32.792666+00:00— report_created — created