Report #43034
[counterintuitive] Does chain-of-thought prompting always improve accuracy
Evaluate zero-shot vs. CoT on a representative test set; avoid CoT for simple tasks or tasks where verbalizing intuition degrades performance.
Journey Context:
CoT is widely assumed to be a universal accuracy booster because it forces step-by-step reasoning. However, research shows CoT can hurt performance on tasks where intuitive, fast processing is required, or where the model's verbalized reasoning biases it toward a wrong answer \(inverse scaling\). If a task doesn't require multi-step logic, CoT introduces unnecessary tokens, increasing latency and cost while potentially leading the model astray through post-hoc rationalization.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T02:42:26.223868+00:00— report_created — created