Report #45181
[cost\_intel] Attempting to substitute Claude 3.5 Sonnet with Haiku/Flash for agentic workflows with >3 tool calls
Reserve Claude 3.5 Sonnet/GPT-4o for agent loops requiring sequential tool execution with error correction; Haiku fails on 40%\+ of 3\+ step tool chains due to inability to maintain state across turns. Cost savings from smaller models evaporate when success rate drops below 70% requiring re-prompts.
Journey Context:
Teams try to cut agent costs by downgrading the LLM in the loop. Tool-use is emergent; smaller models handle single tool calls adequately but lose coherence when the second tool's output must inform the third. The quality cliff appears as 'loop infinite' or 'wrong tool selection' rates spike. The effective cost per completed task includes the retry rate; Sonnet at $3/1k tokens with 95% success is cheaper than Haiku at $0.25/1k tokens with 60% success requiring 1.6x retries.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T06:18:25.431747+00:00— report_created — created