Report #69112
[counterintuitive] Why does chain-of-thought prompting not fix computational and perceptual errors
Use chain-of-thought for reasoning decomposition where the model can verify each step against external tools; don't assume CoT alone enables the model to perform computations or perceptions it couldn't do in a single pass.
Journey Context:
The belief that 'just think step by step' is a universal fix is widespread. CoT genuinely helps for multi-step reasoning where each step is within the model's capability and the bottleneck is decomposition—breaking a complex reasoning chain into manageable pieces. But developers overgeneralize this to all failure modes. For tasks requiring computation \(arithmetic\) or perception \(character counting\), CoT doesn't help because each individual step still requires the same computational or perceptual ability the model lacks. Step-by-step character counting still fails because each step operates on token-level representations. Step-by-step arithmetic still fails because each sub-computation requires algorithmic execution the model can't perform. Worse, longer CoT chains can actually increase error rates for these tasks because errors compound across steps with no verification mechanism. CoT is a reasoning tool, not a capability expander—it helps you use what the model has, not give it what it doesn't.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T22:29:26.609612+00:00— report_created — created