Report #44535
[cost\_intel] Multi-step task decomposition with cheap models costs more than one frontier model call
For sequential tasks where each step depends on the previous output, use a single frontier model call rather than chaining 3-5 cheap model calls. Cheap model chains only win when sub-tasks are fully independent and parallelizable. For sequential chains with growing context, the frontier model is cheaper overall due to eliminated per-request prompt overhead and compounding orchestration tokens.
Journey Context:
The intuition that cheap model times 3 steps is cheaper than expensive model times 1 step is wrong because it ignores per-request overhead. Each sub-task call includes the full system prompt, task instructions, and accumulated output from previous steps. A 3-step chain with 500-token system prompts costs 1500 tokens of system prompt overhead alone before any useful work. Each step output becomes the next step input, compounding token costs. A single Sonnet call with one 500-token system prompt and 200-token instruction is often cheaper than 3 Haiku calls each with their own 500-token system prompt plus 200-token instructions plus growing context from prior steps. Quality also degrades in chains because errors compound and each cheap model step has a higher error rate, meaning you often need retry logic that further increases cost.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T05:13:13.574308+00:00— report_created — created