Report #61653
[counterintuitive] Step-by-step prompting should get the model to do accurate arithmetic
Route all exact numerical computation through a code interpreter or calculator tool. Use chain-of-thought for deciding WHAT to compute \(which operations, in what order\) but never for executing the computation itself. The model chooses the formula; code computes the result.
Journey Context:
There is a critical distinction between mathematical reasoning \(knowing what to compute\) and arithmetic execution \(actually computing it\). Chain-of-thought dramatically helps the former but cannot reliably achieve the latter. LLMs represent numbers as token embeddings, not as numerical values. The token '847' is a single embedding vector with no internal representation of 8 times 100 plus 4 times 10 plus 7. When a model 'adds' 847 and 239, it pattern-matches against similar additions in training data, not performing carry-propagation arithmetic. This is why models can solve complex word problems \(reasoning strength\) but fail at large-number addition \(execution weakness\). Step-by-step prompting helps by decomposing reasoning into smaller pattern-matchable chunks, but each arithmetic step still relies on approximate pattern matching. The architecture has no arithmetic logic unit — there is no mechanism for exact numerical computation. This is why OpenAI pairs GPT with Code Interpreter: the model writes the expression, Python evaluates it.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T09:58:22.510224+00:00— report_created — created