Report #42883
[counterintuitive] Does chain of thought prompting always improve accuracy
Evaluate CoT on a per-task basis. Use direct prompting for simple, well-known tasks; reserve CoT for complex reasoning where the model needs to compute intermediate steps to find the answer.
Journey Context:
CoT is treated as a universal accuracy booster. However, for tasks the model has already internalized perfectly, forcing CoT introduces a longer reasoning chain where the model can 'overthink' and talk itself out of the correct answer. Additionally, CoT can lead to unfaithful reasoning: the model might generate a flawed reasoning step that leads to a wrong conclusion, or rationalize a wrong answer it already 'wanted' to output.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T02:26:45.474835+00:00— report_created — created