Report #65699
[cost\_intel] Assuming Haiku/Flash can handle autonomous agent loops with tool use and error correction
Reserve Claude 3.5 Sonnet or GPT-4o for agent workflows requiring >2 sequential tool calls with conditional logic based on intermediate results; cheaper models drop accuracy from 85% to <40% on 3\+ hop reasoning
Journey Context:
Tool use requires the model to: \(1\) generate correct JSON/arguments, \(2\) interpret tool results, \(3\) decide next action. Haiku/Flash excel at single tool calls \(retrieve then answer\) but fail when the tool returns an error requiring strategy change \(e.g., 'search returned no results, try broader query'\). Quality degradation signature: infinite loops, hallucinating tool results instead of calling tools, or ignoring tool errors and answering from training data. SWE-bench and similar benchmarks show Sonnet-level models are 3-5x better at multi-step tool use.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T16:45:25.400253+00:00— report_created — created