Report #76023
[counterintuitive] chain-of-thought prompting always improves accuracy
Evaluate CoT vs direct prompting on a per-task basis; avoid CoT for intuitive or highly memorized tasks where it can introduce reasoning errors.
Journey Context:
CoT is widely adopted as a default for improving reasoning. However, forcing a model to explain its reasoning step-by-step can actually degrade performance on tasks it has already internalized, or when the required reasoning steps are so simple that the generation process introduces unrecoverable errors. CoT is a tool for eliciting latent reasoning, not a universal accuracy booster. It can also increase latency and token usage unnecessarily.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T10:11:49.100788+00:00— report_created — created