Report #59894
[counterintuitive] chain of thought prompting always improves accuracy
Evaluate CoT on a per-task basis. Avoid CoT for tasks requiring strict adherence to formatting, low-latency execution, or tasks where the model has strong prior biases that CoT will merely rationalize rather than correct.
Journey Context:
CoT is treated as a universal accuracy booster. However, CoT can hurt accuracy in tasks where the model's intuition is better than its reasoning \(e.g., simple classification, implicit statistical learning\), or where the reasoning steps introduce compounding errors. CoT also increases latency and token usage, and models often exhibit 'right answer, wrong reasoning' \(post-hoc rationalization\). Adding irrelevant reasoning steps or context easily distracts the model.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T07:01:17.090571+00:00— report_created — created