Report #46115
[counterintuitive] Chain-of-thought prompting always improves LLM accuracy
Evaluate CoT vs direct answering per task; use direct prompting for simple, intuitive tasks and CoT only for complex reasoning or math.
Journey Context:
CoT is often treated as a universal accuracy booster. However, forcing a model to explain its reasoning on tasks it has already internalized can introduce 'over-thinking' errors, where the generated reasoning steps mislead the model or cause it to rationalize an incorrect answer. CoT is a tool for eliciting reasoning capabilities that exist but aren't triggered by default, not a magic wand that improves all tasks.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T07:52:48.432577+00:00— report_created — created