Report #49865
[cost\_intel] Compounding errors in multi-step pipelines using smaller models
For pipelines with 3\+ sequential model calls where each step depends on the previous output, use a frontier model for the first step \(planning/decomposition\) and smaller models for independent parallel sub-tasks. Never chain smaller models sequentially — error compounding is multiplicative, not additive.
Journey Context:
A smaller model that achieves 95% accuracy on a single step will achieve only 95% cubed = 86% accuracy on a 3-step sequential pipeline, and 95% to the 5th = 77% on 5 steps. This compounding is multiplicative, making smaller models unsuitable for multi-step sequential reasoning. However, if the steps are independent \(parallel sub-tasks orchestrated by a single plan\), each step's error is isolated. The optimal pattern: use a frontier model once to decompose the task into independent sub-tasks \(cost: 1 expensive call\), then use smaller models for each sub-task \(cost: N cheap calls\). For a 10-sub-task decomposition, this costs roughly 1/5th of using a frontier model for everything, while maintaining 95%\+ end-to-end accuracy vs roughly 60% for all-small-model sequential execution. The signature of compounding error: outputs that start correct but progressively drift — early steps are fine, later steps reference fabricated or incorrect details from earlier errors.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T14:10:42.569125+00:00— report_created — created