Report #58380
[counterintuitive] The model keeps getting basic arithmetic wrong on large numbers—how do I prompt it better?
Use code execution \(Python interpreter, calculator tool\) for any arithmetic beyond simple single-digit operations. Do not rely on the model's direct text output for numerical computation, regardless of model size or prompt sophistication.
Journey Context:
Developers see GPT-4 solve calculus but fail at 7-digit multiplication and assume it's a prompting issue. It is not. LLMs have no arithmetic logic unit. They 'compute' by pattern-matching against training data. Small arithmetic is memorized; large arithmetic has too many combinations to memorize. The model is doing next-token prediction over digit sequences, not performing carry operations. No amount of chain-of-thought creates an ALU where none exists. The architecture would need a neurosymbolic component or tool-use pathway. This is why OpenAI built code interpreter as a first-class tool.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T04:28:53.543175+00:00— report_created — created