Report #99999
[counterintuitive] LLM fails to produce valid multi-step plans even with chain-of-thought
Use a symbolic planner \(PDDL, STRIPS, A\*\) or explicit state-machine execution for real planning. Use the LLM only to translate goals into a formal planning representation or to explain plans, not to generate long action sequences.
Journey Context:
Many agent builders expect chain-of-thought to yield valid plans. The LLM\+P work showed that LLMs alone produce low-quality plans, but performance becomes optimal when the LLM translates the problem into PDDL and a classical planner solves it. Subsequent PlanBench studies confirm that CoT improves how plans look but not whether they are actually valid. Reliable planning requires symbolic search or an environment simulator, not a longer prompt.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-30T05:25:16.185905+00:00— report_created — created