Report #92037
[cost\_intel] When do cheap models fail at multi-hop tool use?
Use Claude 3 Haiku or Gemini Flash only for single-turn, single-tool calls with outputs under 200 tokens. Switch to Claude 3.5 Sonnet or GPT-4o for chains involving more than two tool calls or when tool outputs must feed back into subsequent prompts.
Journey Context:
Small models hallucinate parameters when schemas are nested or when previous tool results must be referenced in subsequent calls. Developers often implement retry loops to fix JSON errors from Haiku, but the cumulative token cost of 2-3 retries exceeds using Sonnet once. This specifically applies to agentic workflows with stateful tool use, not simple API wrappers.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T13:04:39.648464+00:00— report_created — created