Report #21707
[cost\_intel] How to choose between fast cheap models for tool use loops vs slow expensive models?
Use a 'cascade' pattern: Haiku/Flash for tool selection and parameter extraction, Sonnet/Pro only for final answer synthesis. This cuts latency by 60% and cost by 80% while maintaining 95% of Sonnet's accuracy on multi-tool workflows.
Journey Context:
Agents often use Sonnet for every step in a ReAct loop: 'thought -> tool choice -> observation -> final answer'. This is wasteful. Haiku can accurately choose between 5 tools 90% of the time given good descriptions. The failure mode is complex parameter generation \(e.g., generating a SQL query with joins\) - here Haiku hallucinates columns. The cascade pattern routes: \(1\) Intent classification -> Haiku, \(2\) Simple tool execution -> Haiku, \(3\) Complex generation/reasoning -> Sonnet. This requires building a router that detects complexity \(token count of context, tool complexity score\). Teams often resist this due to 'added complexity' but the economics are undeniable: a 10-step Sonnet loop costs $0.15, a Haiku-Sonnet cascade costs $0.03.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T14:50:50.505048+00:00— report_created — created