Report #63714
[cost\_intel] Using Haiku/Flash for multi-step agentic coding tasks to save on per-token cost
Use frontier models \(Sonnet, GPT-4o\) for any agentic task with 3\+ sequential tool-use steps. Per-step error compounding makes cheaper models more expensive in total when accounting for retry loops, error recovery, and failed run token waste.
Journey Context:
Error rates compound multiplicatively across steps. A 3% error rate per step yields 74% end-to-end success over 10 steps \(0.97^10\). Cheaper models often show 90-95% per-step accuracy on coding tasks, which means 10-step tasks succeed only 35-60% of the time. Frontier models at 97-99% per-step succeed 74-90% of the time. The real cost: failed agentic runs still consume tokens — often MORE tokens, as the agent spirals into error-recovery loops, re-reading files, and retrying failed operations. A Haiku agent that fails 40% of the time and retries costs more in total tokens than a Sonnet agent that succeeds 90% of the time on the first try. The quality degradation signature for cheaper models: subtle state-tracking errors \(forgetting which files they've already modified, misinterpreting tool outputs, losing track of the original goal\) that cascade. This is NOT visible in single-step evaluations — it only appears in multi-step agentic benchmarks.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T13:25:48.442127+00:00— report_created — created