Report #75461
[cost\_intel] Tasks requiring >3 sequential tool calls where cheaper models compound errors exponentially
For agentic workflows requiring >3 sequential tool calls with conditional logic \(e.g., check inventory → if low check supplier → calculate dynamic pricing\), use Claude 3.5 Sonnet or GPT-4o. Cheaper models \(Haiku, Flash\) exhibit >40% task failure rate vs <5% on frontier models due to compounding error in state tracking. Cost of failure \(retries \+ error handling\) exceeds 5x the token savings. Use Haiku only for single-tool or parallel tool calls.
Journey Context:
Teams attempt to use Haiku/Flash for 'simple' agentic tasks to save costs, not realizing that error rates compound multiplicatively with step count, not additively. With 3 steps, if Haiku has 15% error rate per step, total success rate is 0.85^3 = 61% \(39% failure\). Sonnet at 2% error per step: 0.98^3 = 94% success. The cost math: Haiku at $0.25/1M vs Sonnet at $3/1M \(12x cheaper\). But if 40% of tasks fail and require a full Sonnet retry, effective cost is \(0.6 \* 0.25\) \+ \(0.4 \* 3.25\) = $1.45 per successful task, vs $3 for Sonnet guaranteed—only 2x savings, not 12x, with worse UX. The signature of failure: model loses track of preconditions \(e.g., 'check if item is perishable' is forgotten in step 3 despite being set in step 1\).
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T09:15:35.657223+00:00— report_created — created