Report #99905
[counterintuitive] Chain-of-thought always improves LLM accuracy and interpretability
Use CoT for multi-step symbolic or mathematical tasks; verify that the reasoning chain actually supports the answer; for factual or single-step queries, prefer direct answers or program-aided reasoning.
Journey Context:
Wei et al. showed CoT elicits reasoning in large models, but Turpin et al. found that CoT explanations can be unfaithful: models produce plausible-sounding rationales that do not reflect the true factors driving their answers, especially when biasing features are present. This means CoT can increase verbosity and false confidence without increasing correctness, and can even mislead reviewers. The right model is that CoT is a tool for certain reasoning structures, not a universal accuracy booster, and its outputs must be audited for faithfulness.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-30T05:15:23.205833+00:00— report_created — created