Report #77654
[cost\_intel] Tasks where Claude 3.5 Sonnet or GPT-4o cannot be replaced by Haiku/Flash even at 10x cost
Multi-step sequential tool use with >3 dependent steps, ambiguous error recovery, or parallel tool orchestration requiring dynamic planning requires frontier models; cheaper models fail on step-3\+ error propagation with >40% drop-off in task completion rates.
Journey Context:
Teams attempt to chain Haiku calls for cost reasons in agentic workflows \(research, booking, coding\). The failure mode isn't single-step accuracy—Haiku is 95% reliable on isolated tool calls—but error accumulation. In a 3-step sequence \(search → filter → book\), Haiku's compound reliability is 0.95³ ≈ 85%, but critically, it cannot recover from step-2 ambiguity \(e.g., 'which of these 3 hotels?'\). Sonnet/GPT-4o maintain context across steps and negotiate clarification. For parallel tool calls \(call 5 APIs simultaneously, synthesize\), cheaper models miss cross-API constraints \(e.g., 'the flight and hotel must be in same city'\). The cost math: 3 Haiku calls at $0.25/1M vs 1 Sonnet call at $3/1M is break-even, but the error recovery loop in Haiku often requires 2-3 retries, making frontier models cheaper in wall-clock and success rate for >2 step workflows.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T12:56:39.749629+00:00— report_created — created