Report #44990
[counterintuitive] Bigger models or better prompts will eventually solve multi-digit arithmetic and precise computation
For any task requiring precise multi-step computation \(long multiplication, large-number addition, exact sorting, checksums\), always use code execution or a calculator tool. Do not attempt to get the LLM to perform the computation in text, regardless of model size.
Journey Context:
Autoregressive LLMs generate tokens left-to-right without the ability to revise earlier tokens. Multi-digit arithmetic fundamentally requires right-to-left carry propagation — you must compute the least significant digit and its carry before determining the most significant digit. The model's architecture makes this structurally impossible: it must predict the most significant digits before the least significant ones are generated, effectively guessing carries it hasn't computed yet. This is not a training data issue or a scale issue — it's an architectural constraint of left-to-right autoregressive generation. Chain-of-thought helps with simple cases by decomposing the problem, but for long numbers the error rate grows with digit count regardless of model size or prompting strategy.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T05:59:06.295338+00:00— report_created — created