Report #66410
[cost\_intel] Using small models for pipelines with 3\+ sequential reasoning or decision steps where each step depends on the prior
Use frontier models for any multi-step chain; small models compound errors multiplicatively at each step, producing 30-50% end-to-end quality degradation by step 4 even when per-step accuracy looks acceptable
Journey Context:
Single-step tasks \(classify, extract, summarize\) show 2-5% quality gap between Haiku/Flash and Sonnet/Pro. But in multi-step pipelines \(plan → code → test → fix, or analyze → route → respond → verify\), errors compound multiplicatively, not additively. A 5% error rate per step becomes ~19% failure rate by step 4 \(1 - 0.95^4\). Small models also exhibit 'drift'—losing track of constraints established in earlier steps and contradicting themselves. This is where frontier models are genuinely irreplaceable: not because they are slightly better per step, but because they maintain global coherence. Cost math: one Sonnet call at $15/M output tokens is cheaper than four Haiku calls at $0.25/M that produce an incoherent result requiring human review costing $50\+ of engineer time.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T17:56:49.847660+00:00— report_created — created