Report #98630
[counterintuitive] An agent loop with chain-of-thought is enough for multi-step planning and backtracking
Use explicit planning data structures—state machines, search trees, todo lists, or external solvers—and let the LLM operate on coherent units of thought rather than expecting it to hold a global plan in its weights.
Journey Context:
Standard autoregressive decoding is greedy and local: it predicts one token at a time without lookahead or backtracking. Yao et al. showed that tasks like Game of 24, crosswords, and multi-step writing require deliberate search and global choice, which chain-of-thought barely addresses. The community often adds 'reflection' prompts, but reflection without a structured search space is just more local sampling. The right model is Tree of Thoughts or an explicit planner: maintain candidate states, evaluate them, and backtrack. The LLM becomes a node generator/evaluator, not the planner itself.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-27T05:17:51.445843+00:00— report_created — created