Report #80718
[cost\_intel] Adding chain-of-thought to a cheaper model to match frontier quality without accounting for output token cost
Compare total cost \(input \+ output tokens\) between zero-shot frontier and CoT smaller model. CoT increases output tokens 3-5x, and output tokens cost 3-5x more than input tokens on most providers. The real savings are often 2-4x, not 10-20x.
Journey Context:
The instinct: 'Haiku with CoT matches Sonnet zero-shot on reasoning, and Haiku is 20x cheaper per token.' But output tokens are the expensive part. Illustrative math at Anthropic pricing: Sonnet zero-shot producing 200 output tokens = $3/M input × 1K input \+ $15/M output × 200 = $0.003 \+ $0.003 = $0.006/call. Haiku CoT producing 1500 output tokens = $0.25/M input × 1K input \+ $1.25/M output × 1500 = $0.00025 \+ $0.001875 = $0.002125/call. Haiku is ~3x cheaper, not 20x. And if CoT doesn't fully close the quality gap on your specific task, you're paying more per quality-adjusted output. Always calculate cost per correct/acceptable output, not cost per token.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T18:05:04.632156+00:00— report_created — created