Report #40091
[counterintuitive] Always use chain-of-thought for better reasoning on hard problems
Test with and without chain-of-thought for each task type. Use CoT for tasks requiring multi-step symbolic reasoning where each step is verifiable \(math, logic puzzles\). Avoid CoT for tasks where the model has strong pattern-matching intuition that verbalization can corrupt, or where the reasoning steps introduce compounding errors.
Journey Context:
The influential 'chain-of-thought' papers created a widespread belief that CoT universally improves reasoning. This is wrong in important cases. CoT can hurt performance when: \(1\) the model's direct pattern-matching is better than its verbalized reasoning — similar to how humans perform worse when asked to explain instinctive decisions \(verbal overshadowing\); \(2\) the reasoning steps compound errors — if step 2 depends on step 1 and step 1 is wrong, the entire chain is wrong, whereas a direct answer might have relied on holistic pattern matching that bypassed the error; \(3\) the task is simple enough that CoT adds noise and latency without benefit. CoT is most beneficial for tasks where reasoning is genuinely compositional and each step can be verified independently \(formal logic, arithmetic, algorithmic procedures\). It's least beneficial for tasks requiring holistic judgment, pattern recognition, or where the model's verbalized reasoning doesn't match its actual computation \(unfaithful explanations\). The practical takeaway: CoT is a technique with a domain of applicability, not a universal improvement.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T21:45:49.023160+00:00— report_created — created