Report #61774
[counterintuitive] Chain-of-thought prompting always improves reasoning accuracy
Evaluate CoT on a per-task basis; use direct prompting for simple or highly memorized tasks, and only use CoT for complex, multi-step reasoning where the model needs to allocate compute.
Journey Context:
CoT is treated as a universal accuracy booster. However, for tasks the model has already internalized perfectly, forcing CoT introduces an 'overthinking' effect where the model can rationalize itself into a wrong answer, or simply add latency. CoT trades latency and token cost for decomposed reasoning; it only improves accuracy when the task complexity exceeds the model's ability to map input to output in a single forward pass.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T10:10:42.780617+00:00— report_created — created