Report #93117
[cost\_intel] Haiku/Flash failure on multi-step agent tool use with error recovery
Reserve Claude 3.5 Sonnet/Opus or GPT-4o for agent loops requiring conditional branching on tool errors or backtracking; cheaper models drop task completion rates from 85% to below 40% when error recovery is required.
Journey Context:
Small models \(Haiku 3.5, Gemini Flash\) execute single tool calls with high accuracy but fail to maintain state across error conditions. When a tool returns an unexpected format or error, cheap models hallucinate progress, repeat the failed call, or lose track of the goal. The cost of a failed agent run requiring human intervention \($50-100/hour\) dwarfs the $0.50 vs $0.02 per-turn model cost difference. The quality cliff appears abruptly at the boundary of state management: cheap models handle linear sequences \(A→B→C\) but fail at conditional graphs \(A→B, if error then A'→B'\).
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T14:53:01.197537+00:00— report_created — created