Report #92766
[cost\_intel] Using cheap models in agentic tool-use loops to save on per-token cost
Use frontier models \(Sonnet, GPT-4o\) for agentic loops with tool use. The per-token savings of cheap models are wiped out by 3-5x more turns, higher failure rates, and retry costs. Benchmark total task completion cost, not per-token cost.
Journey Context:
The instinct to use Haiku/Flash for agents is understandable — 10-20x cheaper per token. But in practice, cheap models in agentic loops: \(1\) take 2-5x more turns to complete a task, \(2\) fail to recover from tool errors and enter retry loops, \(3\) hallucinate tool parameters requiring validation and retry, \(4\) lose track of the plan after 3-4 turns. Measured total cost for SWE-bench-style tasks: Sonnet completes in 5-8 turns at $0.10-0.20/task; Haiku takes 15-25 turns with 40% task failure, costing $0.05-0.15 per attempt but needing 2-3 attempts for success. Net: Haiku is often MORE expensive per successful completion. The reliable signature: if your agent loop regularly exceeds 5 turns, the cheap model is lost and burning tokens. The one exception: single-tool-call agents \(no multi-step planning\) can safely use cheap models.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T14:17:50.404176+00:00— report_created — created