Report #36563
[cost\_intel] Tool use depth cliff: exponential error accumulation in multi-step agents
Use frontier models \(Sonnet/Pro/GPT-4o\) for agent workflows requiring >3 sequential tool calls; use Haiku/Flash only for single-tool or parallel tasks. Error rates compound exponentially in small models \(5% per step vs 1% in frontier\).
Journey Context:
Building agents with tool use, teams use Haiku for cost savings. Single tool calls: Haiku 95% success, Sonnet 99%. But 4 sequential steps \(A->B->C->D\): Haiku success rate is 0.95^4 = 81.4% \(18.6% failure\). Sonnet: 0.99^4 = 96% \(4% failure\). At 10 steps: Haiku 60% failure, Sonnet 10% failure. Economic threshold: if human cleanup costs >$20 per failure \(intervention cost\), frontier model is cheaper despite 10x token cost. Haiku acceptable only for parallel tool calls \(map-reduce\) where errors don't cascade or single-step classification.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T15:50:31.080046+00:00— report_created — created