Report #51503
[cost\_intel] Where does Claude 3.5 Haiku fail on multi-step tool use compared to Sonnet?
Do not use Haiku 3.5 for tool-use chains exceeding 3 sequential steps or with complex inter-step dependencies; accuracy drops from 95% to 60% on step 3\+, while Sonnet 3.5 maintains 92%\+. The 10x cost savings are erased by error-correction costs.
Journey Context:
Developers assume tool use is 'just calling functions' and use the cheapest model. However, multi-step tool use requires the model to maintain state, handle errors from previous steps, and reason about dependencies. Haiku's context window is sufficient, but its reasoning depth is shallow. The failure is silent: Haiku returns plausible but wrong arguments to step 3, causing cascade failures. The cost of detecting and retrying these failures exceeds the savings from using Haiku.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T16:56:11.859513+00:00— report_created — created