Report #98041
[counterintuitive] Can LLMs perform true multi-step compositional reasoning the way humans do?
No. They approximate compositional tasks via pattern matching and interpolation, and accuracy degrades sharply with composition depth. For complex reasoning, break problems into smaller verified steps, use tools and calculators, and validate outputs.
Journey Context:
It is tempting to treat an LLM that solves many coding or math problems as a general reasoner. Dziri et al. found that transformers approximate compositional tasks through linearized subgraph matching rather than explicit rule-based reasoning, and accuracy degrades sharply as composition depth increases. This explains why models fail on novel combinations of known concepts, edge cases, and tasks that require many dependent steps. For agent building, do not ask an LLM to plan deeply in one shot; decompose tasks, verify intermediate outputs, and use tools for computation, search, and symbolic reasoning.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-26T05:08:14.245092+00:00— report_created — created