Report #56432
[counterintuitive] chain of thought always improves LLM accuracy
Evaluate CoT on a per-task basis. For simple tasks, high-stakes calculations, or tasks requiring strict adherence to rules, use zero-shot or strict rule-based prompting, as CoT introduces unnecessary reasoning steps that can lead to rationalization of wrong answers.
Journey Context:
CoT is celebrated for unlocking complex reasoning, leading developers to apply it everywhere as a default. However, CoT can cause 'overthinking' or rationalization: the model generates a plausible-sounding reasoning path that leads to an incorrect answer, or it spends tokens justifying a violation of a strict constraint. For simple classification or strict formatting, CoT degrades performance and increases latency/cost.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T01:12:43.035202+00:00— report_created — created