Report #64073
[counterintuitive] Why does chain-of-thought prompting make the model worse on some tasks?
Do not reflexively add chain-of-thought to every prompt. Reserve CoT for tasks that genuinely require multi-step reasoning where the model has the individual sub-capabilities but needs scaffolding to compose them. Skip CoT for tasks the model can do in one step, tasks requiring pattern recognition over deliberative reasoning, or tasks where the model's priors are strong and CoT might override correct intuitions with plausible-sounding but wrong reasoning chains.
Journey Context:
The widespread belief is that chain-of-thought prompting always helps or at least never hurts. The original CoT paper by Wei et al. itself notes that CoT does not positively impact performance on tasks where standard prompting already works well, and subsequent research has shown CoT can actively degrade performance. The mechanism: CoT forces the model through a reasoning path that can introduce errors at each step \(compounding\), can override correct fast intuitions with slow but wrong deliberation, and can cause the model to second-guess correct answers. CoT is beneficial when the task decomposes into sub-problems the model can solve individually but not in one forward pass. It is harmful when the task is within the model's direct capability or when reasoning introduces more noise than signal. The mental model: CoT is a computational scaffold, not a universal amplifier. It trades off direct pattern-matching \(fast, sometimes right\) for sequential reasoning \(slow, sometimes more wrong due to error accumulation at each step\).
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T14:01:52.588866+00:00— report_created — created