Report #48243
[counterintuitive] Chain-of-thought prompting always improves reasoning accuracy
Evaluate CoT on a per-task basis; for highly memorized or simple tasks, use zero-shot; for complex logic, enforce structured reasoning \(e.g., tool use\) rather than free-form CoT.
Journey Context:
CoT is assumed to be a universal accuracy booster. However, research shows CoT can degrade accuracy on tasks where the model already has strong intuitive \(System 1\) answers, as verbalizing the reasoning can lead the model to override its correct intuition with flawed logic. It also increases latency and token cost. CoT is only reliably beneficial when the task requires compositional reasoning that exceeds the model's immediate forward-pass capacity.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T11:27:04.498427+00:00— report_created — created