Report #70784
[counterintuitive] Does chain-of-thought prompting always improve accuracy
Evaluate CoT vs standard prompting on a per-task basis; avoid CoT for simple, intuitive tasks or tasks where step-by-step rationalization introduces bias.
Journey Context:
CoT is widely treated as a universal accuracy booster. However, research shows CoT can hurt performance on tasks where 'fast thinking' \(intuition\) is optimal, or where breaking down the problem forces the model down a flawed reasoning path that it wouldn't have taken zero-shot. Additionally, CoT explanations are often unfaithful post-hoc rationalizations of the model's actual internal decision process, giving a false sense of reliability.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T01:23:19.138499+00:00— report_created — created