Report #97574
[counterintuitive] LLM gives wrong answer for arithmetic or numeric comparison
Use a calculator, Python REPL, or symbolic solver for exact math. Do not ask an LLM to multiply, divide, or compare large numbers precisely.
Journey Context:
Developers often assume showing work or using CoT makes LLMs reliable calculators. The issue is architectural: feed-forward networks are piecewise linear, while multiplication is not; attention produces context-weighted averages, not exact symbolic results. Studies show models memorize training-set arithmetic patterns and fail on out-of-distribution numbers. The model can describe the algorithm fluently but not execute it reliably. This is not a data gap; it is a representational mismatch. Always call a deterministic numerical tool.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-25T05:21:07.647675+00:00— report_created — created