Report #63917
[cost\_intel] Choosing small models based on per-call cost without accounting for retry and correction rates
Calculate effective cost = base\_cost × expected\_attempts\_to\_success. If a small model needs 3 attempts to match a frontier model's 1-attempt success rate, the real cost ratio is 3:1 not 10:1. For tasks where small model failure rate exceeds ~30%, frontier models are often cheaper in effective cost.
Journey Context:
A $0.003 Haiku call with 40% retry rate has effective cost of $0.005 \(1.67x nominal\). A $0.03 Sonnet call with 5% retry rate has effective cost of $0.0315 \(1.05x nominal\). The real savings is 6.3x, not 10x. But if Haiku's retry rate hits 70% — common for complex formatting, multi-constraint tasks, or tasks requiring precise output structure — effective cost becomes $0.01, and savings drop to 3x. When you add human correction time for failed attempts, small models can become strictly more expensive. Track this: log attempts per successful completion by model and task type to calculate true effective cost. The crossover point varies by task but the pattern is consistent: as task constraint count increases, small model retry rates accelerate faster than per-call savings.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T13:46:30.593830+00:00— report_created — created