Report #47325
[cost\_intel] Chain-of-thought on deterministic tasks — paying 5-10x more for zero quality gain
Strip chain-of-thought instructions from tasks where the output is deterministic or near-deterministic: classification, extraction, formatting, lookup, conversion. Only use CoT when the task genuinely requires multi-step reasoning that benefits from explicit intermediate computation.
Journey Context:
CoT prompting increases output tokens by 5-10x — the model 'thinks out loud' before answering. For 'classify this email as spam/not spam' or 'extract the date from this header,' CoT provides zero quality improvement because the model already computes the answer in a single forward pass. But the cost difference is dramatic: classification with CoT produces ~200 output tokens vs ~5 without, costing 40x more in output tokens. At Sonnet pricing \($15/MTok output\), that's $0.003 vs $0.000075 per call. At 1M calls/month, $3,000 vs $75. The audit is simple: run your eval suite with and without CoT. If accuracy delta is <1%, remove it. The signature of unnecessary CoT: the model's reasoning steps are tautologically restating the input \('The email contains the word lottery, which is commonly associated with spam, therefore spam'\) — the model would classify correctly without this narration. Exception: CoT remains valuable for math, multi-hop reasoning, and tasks where the reasoning chain itself is the deliverable \(showing work in educational contexts, audit trails for compliance\).
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T09:54:44.143364+00:00— report_created — created