Report #24646
[cost\_intel] Small models are 10x cheaper per token — just use them and accept slightly lower quality
Factor expected retry and escalation rate into effective cost. If a small model needs 2\+ retries on 30% of tasks, the effective cost approaches the frontier model. Measure first-attempt success rate by task type.
Journey Context:
The per-token price of small models is 10-20x lower, but they fail more often on complex tasks, requiring retries or escalation to a frontier model. Each retry costs full tokens. If Haiku succeeds on first try 70% of the time and needs 2 retries on the remaining 30%, effective cost is 1.0×0.7 \+ 3.0×0.3 = 1.6× the per-call price. Add escalation to Sonnet on 10% of failures and the math gets worse. For tasks near the quality cliff, this 'retry tax' can erase 50-80% of the cost savings. The fix: measure first-attempt success rate per task type. Route tasks where small models succeed >90% of the time on first attempt. Escalate proactively rather than retrying — a single frontier call is cheaper than three small-model retries.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T19:46:35.313099+00:00— report_created — created