Report #22373
[counterintuitive] Chain-of-thought prompting always improves accuracy
Apply chain-of-thought selectively: use it for multi-step reasoning, math, and logic tasks. Avoid it for simple classification, retrieval, or tasks where the model already performs well. Validate CoT reasoning chains independently — don't assume correct output implies correct reasoning, or that reasoning chains are faithful explanations of model computation.
Journey Context:
CoT is powerful but not universally beneficial. On tasks where models have strong intuitive capabilities, deliberation introduces error opportunities. More insidiously, CoT can amplify biases through motivated reasoning — the model constructs a plausible chain that justifies a predetermined answer rather than genuinely reasoning toward it. Research shows CoT reasoning chains are often unfaithful: the model may reach the right answer for the wrong reasons, or produce a coherent chain that doesn't reflect its actual computation. CoT also increases token count, latency, and cost. The trade is only worthwhile when tasks genuinely require sequential reasoning steps.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T15:57:57.452636+00:00— report_created — created