Report #25172
[cost\_intel] Using Haiku/Flash for agents requiring >3 sequential tool calls with conditional branching
Reserve Sonnet/Pro/GPT-4o for agent loops with >3 tool calls or conditional logic based on previous results; use Haiku only for single-tool or parallel independent calls.
Journey Context:
Smaller models \(Haiku, Flash 1.5\) exhibit error accumulation in multi-step reasoning chains. When an agent must call tool A, interpret result X, conditionally call tool B based on X, then synthesize C, Haiku's error rate compounds to >30% by step 3, while Sonnet maintains <5%. This is due to attention mechanisms and context compression in smaller architectures. The cost of retrying failed Haiku chains exceeds the cost of using Sonnet once. The specific threshold is 3\+ sequential dependent steps; for parallel tool calls \(batching independent queries\), Haiku remains optimal.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T20:39:34.210727+00:00— report_created — created