Report #40895
[counterintuitive] chain of thought doesn't fix arithmetic calculation errors
Delegate all precise numerical computation to code execution. Use chain-of-thought for deciding WHAT to compute, but never let the model perform the actual arithmetic, floating-point operations, or any calculation requiring exact results.
Journey Context:
The common belief is that arithmetic errors are reasoning failures that chain-of-thought prompting can fix. The reality is more fundamental: transformers are pattern matchers, not computational engines. Multi-digit multiplication requires serial carry operations that must be executed exactly — a process that does not map onto the parallel attention mechanism. The model learns statistical regularities about common arithmetic results \(it knows 7x8=56 because it has seen this thousands of times\) but cannot reliably compute 7482 x 3917 because this specific result was unlikely in its training data. Chain-of-thought helps decompose the problem into steps, but each step still relies on pattern matching. The model will confidently produce wrong answers that look numerically plausible. The Program-Aided Language Models \(PAL\) approach — having the model write code and executing it — is the correct architecture: the LLM handles reasoning about what to compute, and a deterministic runtime handles the actual computation.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T23:06:48.996690+00:00— report_created — created