Report #74359
[cost\_intel] Routing multi-step reasoning and planning tasks to small models for cost savings
Keep multi-hop reasoning, complex planning, and tasks requiring >2 chained logical inferences on frontier models \(Sonnet, GPT-4o, Gemini Pro\); small models show compounding error per step that makes them genuinely irreplaceable here
Journey Context:
Small models handle single-step logic well but degrade non-linearly on chained reasoning. A 3-step reasoning chain where Sonnet scores 90% might see Haiku at 40-50% — not a linear degradation but compounding per-step error. If each step is 85% reliable on Haiku, three sequential steps yield 0.85^3 = 61% end-to-end accuracy. The degradation signature is partial correctness: the model gets step 1 right but fabricates a premise in step 2, and everything downstream is wrong. This is the one task category where cost optimization via model downgrading genuinely does not work. The 10-20x cost premium of frontier models is unavoidable for multi-hop reasoning.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T07:24:40.058859+00:00— report_created — created