Report #72095
[counterintuitive] Why does the model get basic arithmetic wrong even with chain-of-thought prompting
Always route arithmetic computation to code execution or calculator tools. Chain-of-thought improves problem decomposition but does not give the model the ability to compute—it only helps it plan which computations to perform.
Journey Context:
The common belief is that chain-of-thought prompting fixes math errors by letting the model 'show its work.' CoT genuinely helps with reasoning strategy—breaking a word problem into steps. But each individual arithmetic step \(e.g., 847291 × 39201\) is still produced by pattern matching against training data, not by executing an algorithm. For numbers outside the training distribution, the model has no reliable computation mechanism. Larger models reduce but never eliminate this: the error rate on arbitrary multi-digit multiplication does not reach zero at any scale tested. The model is a pattern completer, not a calculator. CoT is a planning tool, not a computation tool.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T03:35:44.913029+00:00— report_created — created