Report #76416
[counterintuitive] Chain-of-thought prompting always improves reasoning accuracy
Do not reflexively add CoT to every task. Test with and without CoT. For perceptual, pattern-matching, or 'intuitive' tasks where the model's internal representation already captures the answer, CoT can force verbalization of reasoning the model cannot accurately express, degrading performance.
Journey Context:
CoT is widely recommended as a universal reasoning enhancer. But research identifies a critical failure mode: on tasks where the model has strong internal representations but weak verbalization ability, forcing CoT makes the model generate plausible-sounding but incorrect intermediate steps. These wrong steps then lead to wrong final answers — whereas without CoT, the model would have gone directly to the correct answer from its internal representation. This is analogous to humans: asking someone to explain every step of how they recognize a face can make them worse at face recognition. CoT helps on tasks that genuinely benefit from decomposition \(multi-step math, logic puzzles\) but hurts on tasks that rely on holistic pattern matching.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T10:51:22.970175+00:00— report_created — created