Report #86522
[counterintuitive] Chain-of-thought prompting fixes the model's math errors
Use code execution or calculator tools for any arithmetic requiring precision. Use CoT for mathematical reasoning \(deciding which steps to follow\) but not for computation \(executing the steps\).
Journey Context:
CoT dramatically improves the model's ability to decompose problems and choose solution strategies, but the actual computation step still relies on next-token prediction of digits. When multiplying 347 x 892, even with CoT, the model predicts each digit of intermediate and final results probabilistically. A single digit error propagates and invalidates the entire computation. This is fundamentally different from how a calculator works. CoT helps the model decide WHAT to compute but not HOW to compute it precisely. Error rate grows with the number of arithmetic operations, making complex multi-step calculations unreliable regardless of prompting. The model also cannot reliably self-correct arithmetic errors — asking it to 'check your work' often just re-generates the same wrong answer.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T03:49:09.639214+00:00— report_created — created