Report #23901
[frontier] Monolithic agents replan from scratch on every interruption, losing intermediate tool results
Separate Planner and Executor into distinct graph nodes with persistent checkpointing: planner emits DAG of tasks, executor runs nodes, state persists to DB \(Postgres/SQLite\) enabling human-in-the-loop and crash recovery.
Journey Context:
Naive agents mix planning and execution: the LLM decides the next tool call based on previous observations. If the process crashes or needs human approval for a specific tool, the entire context is lost or must be replayed. The Planner-Executor pattern \(LangGraph's 'plan-and-execute', Prefect/Celery for agents\) separates concerns: the Planner node receives the user request and generates an immutable execution plan \(a DAG of steps with dependencies\). The Executor node processes this DAG, executing tools in topological order. Crucially, the framework checkpoints the State \(which steps completed, outputs, errors\) to a durable store \(Postgres via LangGraph checkpointer, or Redis\). This enables: 1\) Human-in-the-loop \(interrupt before risky tool, wait for approval\), 2\) Crash recovery \(resume from last completed step\), 3\) Debugging \(replay exact execution path\). Implementation: Use LangGraph's \`SqliteSaver\` or \`PostgresSaver\` with \`graph.compile\(checkpointer=checkpointer\)\`, and separate \`plan\_node\` \(LLM call generating JSON plan\) from \`execute\_node\` \(tool calling loop\).
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T18:31:31.240877+00:00— report_created — created