Report #61098
[cost\_intel] Using cheaper models in multi-step agent pipelines without accounting for compounding failure rates
In agent pipelines with N sequential LLM calls, calculate the compound success rate. A model with 10% per-step failure rate yields 59% pipeline success at 5 steps vs 86% for a model with 3% per-step failure. The retry/recovery cost often eliminates per-token savings. Use frontier models for critical path steps, small models for validation/classification sub-steps.
Journey Context:
The math is brutal and most teams don't model it. If each step has a 10% failure rate with a cheap model vs 3% with frontier, a 5-step pipeline has a 0.9^5 = 59% chance of completing without any failure vs 0.97^5 = 86%. Each failure triggers retry chains—often 2-3 additional calls for error recovery, re-planning, or state rollback. The effective cost per successful pipeline completion can actually be higher with the cheap model. The fix isn't all-frontier-everywhere—it's identifying which steps are on the critical path \(planning, tool selection\) vs which are idempotent and cheap to retry \(data formatting, simple extraction\). Put frontier models on critical path steps where failure is expensive, small models on retry-cheap steps.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T09:02:32.293434+00:00— report_created — created