Report #78315
[counterintuitive] Does chain of thought prompting always improve accuracy
Evaluate Chain-of-Thought \(CoT\) on a per-task basis. Avoid CoT for tasks requiring strict adherence to pre-memorized sequences, intuitive snap judgments, or highly constrained formatting where deliberation degrades performance.
Journey Context:
CoT is treated as a universal accuracy booster because it helps with complex math and logic. However, research shows CoT can decrease performance on tasks where models already have strong intuitive \(System 1\) capabilities, or where verbalizing reasoning interferes with implicit pattern matching. 'Thinking' can override a correct snap judgment with an incorrect rationalization, and adds latency and token cost.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T14:02:56.952435+00:00— report_created — created