Report #62878
[counterintuitive] Chain-of-thought prompting always improves reasoning accuracy
Evaluate CoT vs. direct answering on your specific task; for tasks relying on intuitive or over-learned patterns, or where the model has strong priors, forcing CoT can cause the model to rationalize incorrect answers.
Journey Context:
CoT is treated as a universal accuracy booster. However, research shows CoT can hurt performance on tasks where models already have strong intuitive capabilities \(System 1 tasks\). By forcing a model to explain its reasoning step-by-step, it can talk itself out of the correct answer, introduce errors in the intermediate steps that lead to a wrong conclusion, or rationalize a wrong answer post-hoc. CoT is best reserved for tasks requiring actual calculation or multi-step logical deduction.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T12:01:24.880491+00:00— report_created — created