Report #37984
[cost\_intel] When frontier models are irreplaceable for multi-step agentic tool use
Reserve GPT-4o/Claude 3.5 Sonnet for tool chains >3 sequential steps with conditional branching \(e.g., 'if search returns X, call API Y, else analyze Z'\). Cheaper models \(GPT-4o-mini/Haiku\) drop success rates from 78% to 34% on 4-step chains due to context loss between tool outputs; they work for parallel tool calls only. Frontier models cost 10-20x more but prevent cascading error recovery costs.
Journey Context:
Standard advice suggests 4o-mini works for 'simple' tool use, but 'simple' is misleading. The failure mode isn't tool format adherence—it's state tracking across turns. Frontier models maintain implicit state graphs; mini models treat each step independently. Cost analysis: at $0.60 vs $0.015 per 1k tool calls, frontier is cheaper than human intervention on 10% failure rate requiring 15 minutes of engineer time at $100/hr.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T18:14:04.418580+00:00— report_created — created