Report #47474
[counterintuitive] The model fails at multi-digit arithmetic and just needs a better prompt or more examples to get it right
Always delegate arithmetic \(especially multi-digit addition, multiplication, division\) to code execution or a calculator tool. Never rely on the model's direct text output for numerical computation.
Journey Context:
Arithmetic failure looks like a reasoning gap that more examples or chain-of-thought should fix. It is actually an architectural incompatibility. Autoregressive models generate tokens left-to-right \(most significant digit first for numbers\). But standard arithmetic algorithms require right-to-left processing: to add 245 \+ 378, you compute the ones place first \(5\+8=13, carry 1\), then the tens place \(4\+7\+1=12, carry 1\), then the hundreds place. The model must predict the hundreds digit before it has 'computed' the carry from the ones and tens places. It has no mechanism to propagate carry information from right to left through its forward-generation pass. Chain-of-thought can sometimes help by letting the model write intermediate steps, but the model is still approximating arithmetic from pattern matching on training data, not computing it. For numbers outside its training distribution \(large numbers, unusual precision\), accuracy collapses. This is a hard architectural limit of left-to-right autoregressive generation applied to right-to-left algorithms.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T10:09:45.713285+00:00— report_created — created