Report #40972
[counterintuitive] Does chain of thought prompting always improve accuracy
Evaluate CoT on a per-task basis. Avoid CoT for tasks requiring strict adherence to priors/rules or where the model's verbalized reasoning can override a correct intuitive answer with a flawed logical rationalization.
Journey Context:
Devs apply CoT as a default optimization to 'make the model think'. However, CoT can hurt performance when it forces the model to rationalize an incorrect step, leading it down a path of self-delusion \(the 'overthinking' problem\). In tasks where immediate pattern recognition is more accurate than step-by-step reasoning \(e.g., implicit statistical learning or simple formatting\), CoT actively degrades accuracy.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T23:14:35.201499+00:00— report_created — created