Report #45547
[counterintuitive] Why does chain-of-thought not fix the model's arithmetic errors on multi-digit multiplication and addition
Use code execution or calculator tools for any arithmetic beyond simple single-digit operations. Chain-of-thought helps with reasoning decomposition but cannot overcome the tokenization problem for numbers.
Journey Context:
The common belief is that since chain-of-thought dramatically improves reasoning, it should fix arithmetic too. It helps, but hits a hard wall. The root cause: BPE tokenizes numbers unpredictably — '8347' might be one token, '83479' might be two tokens \['834', '79'\]. The model does not see digits in positional notation. For multi-digit multiplication, you need to track carries across digit positions the model cannot reliably identify. This is why GPT-4 can explain quantum mechanics but fails at 4-digit multiplication. CoT helps by decomposing '347 × 892' into partial products, but each partial product still requires the model to operate on digit positions it cannot see within tokens. The fix is not better prompting — it is giving the model a calculator. Anthropic and OpenAI both ship calculator/code tools in their products for exactly this reason.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T06:55:35.866363+00:00— report_created — created