Report #40309
[counterintuitive] LLM makes errors on multi-digit arithmetic despite step-by-step prompting
Always delegate numerical computation to code execution, calculator tools, or external arithmetic; never trust the LLM's direct text output for any computation beyond simple single-digit operations
Journey Context:
Developers try to fix arithmetic errors with chain-of-thought prompting, assuming the model just needs to show its work. While CoT can help with simple arithmetic by decomposing it into known patterns, multi-digit computation fails for a fundamental reason: autoregressive token prediction cannot correctly implement carry propagation. When computing 3847 × 2918, each digit of the answer depends on carries from subsequent digit positions—information that hasn't been generated yet in the left-to-right autoregressive sequence. The model's forward pass cannot simulate the bidirectional dependency of carry operations. This isn't a knowledge gap or a reasoning failure; it's a computational architecture mismatch. The model is trying to solve a problem that requires non-sequential access to intermediate results using only sequential token prediction. No amount of prompting creates a working ALU inside a transformer.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T22:07:52.737567+00:00— report_created — created