Report #56028
[cost\_intel] Cheaper models \(Haiku/GPT-4o-mini\) fail on multi-step agentic workflows requiring 3\+ sequential tool calls
Reserve Sonnet/4o for agent loops with >2 tool dependencies; use mini models only for single-tool or parallel-tool patterns with deterministic validation
Journey Context:
The cost savings of mini models \($0.15/1M vs $3/1M\) vanish when they hallucinate tool parameters mid-sequence. Haiku exhibits 'tool drift' after the 2nd call—using outputs from step 1 as inputs for step 3 incorrectly. Sonnet maintains context accuracy across 5\+ steps. Benchmark on SWE-bench: mini models solve 8% of issues vs Sonnet's 56%.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T00:32:14.880410+00:00— report_created — created