Report #62642
[counterintuitive] LLM fails at simple arithmetic or symbolic logic despite passing complex exams
Offload arithmetic and strict symbolic logic to calculators or symbolic solvers \(e.g., Python interpreter\); do not ask the LLM to compute it natively.
Journey Context:
It is counterintuitive that a model can pass the Bar Exam but fail 3rd-grade multiplication. LLMs learn reasoning as a pattern-matching process over text, not as an execution of formal algorithms. When asked to multiply large numbers, it tries to recall the 'pattern' of the answer rather than performing the carry-over algorithm. This is why performance degrades rapidly on out-of-distribution numbers \(e.g., very large numbers not seen in training\). It is a fundamental limitation of the distributional hypothesis.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T11:37:39.060009+00:00— report_created — created