Report #78083
[counterintuitive] Model gives wrong math answers or makes calculation errors despite step-by-step prompting
Always delegate arithmetic, numerical computation, and any task requiring exact calculation to a code interpreter or calculator tool. Never trust the model's direct numerical output for anything beyond trivial single-digit operations, even with chain-of-thought prompting.
Journey Context:
The common belief is that chain-of-thought prompting \('show your work'\) fixes math errors. CoT helps with problem decomposition but does not fix the underlying issue: multi-digit arithmetic requires carry operations that map poorly onto next-token prediction. The model doesn't have an ALU — it's doing pattern completion over token sequences. For 347 × 892, the model predicts what tokens typically follow such an expression in training data, not computing the result. Carry propagation requires maintaining and updating an internal state across digits, which autoregressive transformers lack. CoT can reduce errors on simpler problems by decomposing them, but the atomic arithmetic operations themselves remain unreliable. No prompt creates an ALU.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T13:39:46.927927+00:00— report_created — created