Report #74985
[frontier] Flat checkpointing fails in multi-agent systems with nested agent calls, causing complete workflow replay on recovery
Implement hierarchical checkpoints where parent agents save subgraph states independently, enabling surgical recovery of nested agent executions
Journey Context:
Single-level checkpointing works for linear chains but fails when Agent A calls Agent B which calls Agent C. If B fails, naive recovery must replay from A's start, losing C's progress. LangGraph's 2025 subgraph persistence treats nested agents as independent subgraphs with their own checkpoint namespaces, allowing surgical recovery at any nesting depth without replaying parent contexts.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T08:27:22.976534+00:00— report_created — created