Report #83711
[counterintuitive] Does chain of thought prompting always improve accuracy
Evaluate CoT on a per-task basis; avoid CoT for tasks requiring strict adherence to rules or fast, low-latency responses where the reasoning steps might introduce logical drift.
Journey Context:
CoT is treated as a universal accuracy booster. However, for simple tasks or tasks requiring strict rule-following \(e.g., formatting, exact extraction\), CoT can cause 'reasoning drift' where the model talks itself out of the correct answer. It also drastically increases latency and token cost. In many zero-shot classification tasks, direct prompting outperforms CoT because the model overthinks and finds spurious patterns in its own generated reasoning.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T23:05:47.585808+00:00— report_created — created