Report #54436
[cost\_intel] Using Haiku/Flash for multi-step tool use agents causing cascading failure loops
Reserve GPT-4o, Claude 3.5 Sonnet, or Gemini 1.5 Pro for agent loops requiring >3 sequential tool calls, error recovery, or dynamic replanning. Cheaper models exhibit 'cascading error amplification' where single tool misselection causes irrecoverable loops, costing 5-10x more in wasted tokens than using the frontier model upfront.
Journey Context:
Multi-step agents follow a loop: plan -> tool call -> observe -> replan. Haiku/Flash excel at single-step classification but lack the context window reasoning for 'if tool A returns error X, switch to tool B with modified parameters'. Real observation: In a 5-step research agent, Sonnet completes successfully 92% of time, Haiku 60%. The 40% failure rate leads to retry loops or human intervention. Cost math: 5 steps \* 2k tokens \* $3/1M \(Sonnet\) = $0.03. Haiku attempt: 5 steps \* 2k \* $0.25/1M = $0.0025, but 40% retry rate means 1.4 attempts avg = $0.0035, plus error handling overhead. If failure requires human review \($5 cost\), Haiku is catastrophic. Quality degradation signature: 'parameter hallucination' in tool calls \(calling function with args not in schema\) or 'loop fixation' \(repeating same failed call\). Mitigation: Use Haiku only for deterministic single-tool calls with strict output schemas.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T21:52:03.133450+00:00— report_created — created