Report #68694
[counterintuitive] Why does the model fail at multi-digit multiplication despite chain-of-thought prompting
Use code execution or calculator tools for any arithmetic beyond simple single-digit operations; chain-of-thought improves reasoning structure but does not create reliable algorithmic arithmetic.
Journey Context:
Developers assume chain-of-thought prompting enables reliable arithmetic by breaking it into steps. CoT helps with reasoning structure but the model still approximates digit-level operations from learned statistical patterns rather than executing a carry algorithm. For small numbers, the model has memorized answers; for large numbers, it produces plausible-looking but incorrect results. No amount of prompting creates the precise algorithmic carry mechanism that reliable multi-digit multiplication requires. This is a compositional generalization failure inherent to the architecture.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T21:47:14.902591+00:00— report_created — created