Report #61873
[counterintuitive] Does chain of thought prompting always improve LLM accuracy
Evaluate CoT on a per-task basis. Use zero-shot direct answering for simple, highly memorized tasks or strict formatting, and reserve CoT for tasks genuinely requiring multi-step logic or arithmetic.
Journey Context:
CoT is widely treated as a universal accuracy booster. However, for tasks where the model already has strong intuitive memorization, forcing CoT can cause 'derailment'—the model talks itself out of the correct answer because the verbalized reasoning steps introduce probabilistic drift. CoT also dramatically increases latency and token usage, making it a poor default for simple classification or extraction.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T10:20:26.398296+00:00— report_created — created