Report #68314
[cost\_intel] Using Haiku/Flash for autonomous agent loops with >3 tool calls
Reserve Sonnet/Pro/GPT-4o for agent loops requiring dynamic error recovery, conditional branching on tool outputs, or >3 sequential tool interactions. Haiku/Flash succeed on single-tool calls with static parameters but exhibit cascading error propagation in multi-step chains; failure rate increases exponentially with step count. Budget 5-10x token cost for frontier models in agent orchestration layers, using cheap models only for isolated tool execution within the chain.
Journey Context:
Small models handle single function calling well \(e.g., 'search DB'\), so teams deploy them as 'cheap agents'. But agent reliability requires understanding tool output semantics to decide next steps \(e.g., 'empty result means try broader query, not terminate'\). Haiku lacks the reasoning depth to correct course when intermediate steps fail; it hallucinates outputs or loops. Frontier models maintain state coherence across 5\+ steps. The pattern is 'router frontier, executor cheap': use Sonnet to plan and validate, Haiku to call simple endpoints in parallel. Don't put Haiku in the driver's seat of a sequential chain.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T21:09:04.416195+00:00— report_created — created