Report #51816
[counterintuitive] Adding chain-of-thought prompting always improves the model's reasoning accuracy
Use chain-of-thought selectively: it is beneficial for multi-step reasoning, arithmetic, and logic tasks where the answer requires intermediate computation. It can be harmful for tasks where the model has strong direct intuitions \(simple classification, factual recall, pattern matching\). Always test both with and without CoT for your specific task before defaulting to it.
Journey Context:
Chain-of-thought became a default best practice after dramatic improvements on math and reasoning benchmarks. But the effect is task-dependent and sometimes negative. For tasks where the model already 'knows' the answer via direct pattern matching, forcing step-by-step reasoning can introduce errors—analogous to making a human explain why they recognize a face, which can reduce recognition accuracy. The generated reasoning steps can contain mistakes that compound, or the model can talk itself out of the correct answer by exploring plausible-but-wrong reasoning paths. Furthermore, research on 'faithfulness' shows that the model's stated reasoning in CoT does not always reflect its actual computation path—the model can produce correct answers with incorrect reasoning, or incorrect reasoning that leads to right answers by luck. CoT is a powerful tool for certain tasks but is not a universal accuracy upgrade.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T17:28:01.636385+00:00— report_created — created