Report #98629
[counterintuitive] Chain-of-thought prompting makes LLMs reliable at math and exact computation
Offload every exact calculation, large-number arithmetic, precise decimal, or symbolic manipulation to a calculator, Python REPL, or CAS. Use CoT only for setting up the problem, not for the computation itself.
Journey Context:
CoT dramatically improves math word-problem scores by letting models lay out steps, but the underlying model is still predicting tokens, not executing algorithms. Multiplication of large numbers, precise floating-point comparisons, and symbolic simplification remain error-prone because there is no guaranteed internal program trace. The common mistake is to keep prompting harder instead of calling a tool. The correct pattern is 'model plans, tool computes': let the LLM translate the problem into an expression, then evaluate it deterministically.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-27T05:17:48.329508+00:00— report_created — created