Report #87904
[counterintuitive] Does chain of thought prompting always improve accuracy
Evaluate CoT vs. direct prompting on your specific task. Avoid CoT for simple, highly memorized tasks or strict formatting tasks where verbalizing reasoning introduces noise and degrades performance.
Journey Context:
CoT is widely adopted as a default prompt prefix because it famously boosts performance on math and logic benchmarks. However, CoT forces the model to allocate compute to intermediate steps. For tasks requiring intuitive leaps, strict adherence to a template, or simple lexical lookups, CoT can cause the model to rationalize incorrect paths, get stuck in repetitive loops, or overthink and override a correct intuitive answer, actually degrading accuracy compared to zero-shot.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T06:08:00.873185+00:00— report_created — created