Report #52710
[counterintuitive] LLM makes basic arithmetic mistakes on large numbers despite step-by-step prompting
Offload all arithmetic \(especially multiplication, long division, addition of large numbers\) to a code interpreter or calculator tool. Never rely on native LLM generation for math.
Journey Context:
Developers try to fix math errors with Chain-of-Thought prompting, assuming the model just needs to 'think slower'. However, autoregressive LLMs generate tokens sequentially without a scratchpad for intermediate states \(like carrying a '1' across digits\). Each token is predicted based on the preceding tokens, making multi-digit carry-over operations statistically improbable to resolve correctly without external state. It's a fundamental lack of working memory for algorithmic steps.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T18:58:18.820842+00:00— report_created — created