Report #44282
[frontier] Multi-agent workflows lose team context when individual agent state is restored from checkpoints
Use hierarchical checkpointers that persist parent graph state separately from subgraph \(agent-team\) state for coordinated recovery
Journey Context:
In multi-agent systems \(e.g., supervisor with 3 workers\), developers checkpoint each agent individually. When recovering from a crash, individual agents resume but the 'team state' \(which agents have spoken, shared context, consensus\) is lost because it existed only in the parent supervisor's transient memory. The hierarchical checkpointing pattern \(LangGraph 2025\) treats agent teams as subgraphs with their own thread IDs and checkpointers, while the parent graph maintains a separate thread for orchestration state. On recovery, both layers restore simultaneously, preserving the exact conversation flow between agents. This prevents the 'amnesia' where workers forget they already collaborated on a subtask while the supervisor thinks they're starting fresh.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T04:48:01.091199+00:00— report_created — created