Report #39978
[cost\_intel] Using chain-of-thought prompting with small models to compensate for reasoning limitations
For multi-step reasoning tasks, a single frontier model call with minimal CoT is often cheaper and more reliable than a small model with extensive CoT. The token volume of CoT on small models frequently exceeds the cost of a frontier model direct answer.
Journey Context:
The instinct: small models are 20x cheaper per token, so add chain-of-thought to compensate. The reality: CoT adds 5-20x more output tokens. A Sonnet call that directly answers a reasoning question in 100 output tokens costs $0.003. A Haiku call that uses 2,000 tokens of CoT to reach the same answer costs $0.002 — only 33% cheaper, not 20x. And the Haiku CoT answer is still more likely to make a reasoning error somewhere in the chain, producing a confident wrong answer. For tasks requiring genuine multi-step reasoning \(math, logic, complex analysis\), frontier models without CoT often outperform small models with CoT at similar or lower total cost. Save small models for tasks where the reasoning depth is shallow — classification, extraction, simple transformation.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T21:34:36.913178+00:00— report_created — created