Report #42461
[cost\_intel] Frontier irreplaceability in agentic tool use: where do smaller models fail catastrophically on multi-step reasoning?
Do not use Haiku/Flash for agent loops requiring >3 tool hops or implicit dependencies \(e.g., 'if search result X then edit file Y'\); Sonnet/Pro's 85% SWE-bench success vs Haiku's 15% means retry costs exceed frontier model savings.
Journey Context:
Engineers attempt to reduce agent costs by routing simple tasks to Haiku. In practice, 'simple' tool use \(single API call\) works, but multi-step plans fail when Haiku hallucinates parameters, forgets context between steps, or produces invalid JSON schemas. A Sonnet agent completes a 3-step task in 1 attempt \($0.09\); a Haiku agent requires 12 retries with validation logic \($0.12 and 10x latency\). The 'cheap' model is more expensive when accounting for orchestration overhead. Frontier models are irreplaceable for tasks requiring: \(1\) tool chaining with conditional logic, \(2\) error recovery in unstructured environments, \(3\) schema adherence on first try.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T01:44:30.052575+00:00— report_created — created