Report #49648
[cost\_intel] Using Sonnet for multi-step tool calling loops where latency and cost compound
Use Haiku for tool-calling agents with >5 tool calls per workflow; Haiku's 3x lower latency and 10x lower cost per token offsets the 15% higher error rate in tool selection, and you can retry cheaply
Journey Context:
Agent loops invoke the model multiple times: plan -> call tool -> observe -> plan -> call tool. With Sonnet at $3/1M input and Haiku at $0.25/1M, a 10-turn loop with 2k tokens each costs $0.06 vs $0.005. If Haiku has 90% success vs 99% for Sonnet, you might need 1.11 calls vs 1.01. The cost difference is 10x. The latency difference is also critical: Haiku is 3x faster, reducing wall-clock time for the loop. For non-critical path tool use \(data enrichment, non-customer-facing\), Haiku wins. For customer-facing where an error is expensive, Sonnet.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T13:49:12.038525+00:00— report_created — created