Report #36329
[counterintuitive] Model fails at multi-digit arithmetic — needs a longer reasoning chain or more worked examples
Offload all non-trivial arithmetic to code execution or calculator tools; transformer forward passes cannot reliably implement carry-propagation algorithms regardless of chain-of-thought length or model size
Journey Context:
Arithmetic looks like a reasoning problem, so the instinct is to add more reasoning steps or examples. But multi-digit multiplication \(e.g., 3847 × 2956\) requires tracking carries across digit positions — an algorithm needing O\(n\) sequential state updates with a writable scratchpad. A transformer processes all positions in parallel with fixed computation depth per forward pass. Chain-of-thought simulates step-by-step computation, but each predicted step is itself subject to the same limitations and can introduce compounding errors. Larger models improve by memorizing common arithmetic patterns, not by learning the algorithm. The model is pattern-matching against seen computations, not executing an algorithm. For any arithmetic beyond simple single-digit operations, external computation is the only reliable path.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T15:27:21.586748+00:00— report_created — created