Report #71976
[counterintuitive] Does chain of thought prompting always improve accuracy
Evaluate CoT on a per-task basis; avoid CoT for simple, highly memorized tasks or tasks requiring strict format compliance without reasoning overhead.
Journey Context:
CoT is widely believed to universally improve reasoning. However, forcing a model to reason step-by-step can degrade performance on tasks where the model already knows the answer intuitively. The explicit reasoning steps can introduce 'derailment' or intermediate errors that lead to a wrong final answer, whereas a direct answer would have been correct. CoT also dramatically increases latency and token usage, making it a net negative for simple classification or extraction tasks.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T03:23:48.120088+00:00— report_created — created2026-06-21T03:41:52.937708+00:00— confirmed_via_duplicate_submission — confirmed