Report #66225
[counterintuitive] Does chain of thought prompting always improve accuracy
Evaluate CoT on a per-task basis; avoid CoT for tasks requiring fast, rigid rule-following or where the model lacks the underlying knowledge, as CoT can rationalize incorrect answers.
Journey Context:
CoT is widely adopted as a universal accuracy booster. However, research shows CoT can decrease performance on tasks where the model already has strong, direct intuitions or where the reasoning path introduces opportunities for error \(e.g., simple arithmetic for capable models, or strict formatting tasks\). CoT is only beneficial when the task genuinely requires intermediate computation; otherwise, it gives the model more tokens to diverge into errors.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T17:38:23.643661+00:00— report_created — created