Report #100490
[counterintuitive] LLM gets multi-digit arithmetic, symbolic substitution, or compositional reasoning wrong even with detailed step-by-step prompts
Offload exact computation, algebra, and rule-based composition to external verifiable tools \(Python interpreter, SymPy, SAT/SMT solver, regex\). Use the LLM only to translate the natural-language problem into a formal representation that the tool can execute.
Journey Context:
Developers often assume scaling plus chain-of-thought will teach models robust algorithms. 'Faith and Fate' shows that transformers fail systematically on compositional tasks because they approximate training distributions rather than learning explicit compositional rules; they interpolate familiar patterns but break on out-of-distribution recombinations. The model can parrot the shape of a proof or calculation, but it has no guaranteed execution semantics. The right pattern is LLM-as-compiler/translator, not LLM-as-calculator.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-07-01T05:19:09.940208+00:00— report_created — created