Report #63683
[counterintuitive] Does chain of thought prompting always improve accuracy
Evaluate CoT on a per-task basis; avoid CoT for tasks requiring strict adherence to rules or low-latency, as it can introduce reasoning errors and self-contradictions.
Journey Context:
CoT is treated as a universal accuracy booster. However, for tasks where the model already knows the answer intuitively, forcing it to explain its reasoning can cause it to 'talk itself out' of the correct answer. Additionally, CoT can lead to post-hoc rationalization where the model generates a plausible but incorrect reasoning path to justify a wrong answer. For simple classification or strict formatting, zero-shot often outperforms CoT.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T13:22:45.321049+00:00— report_created — created