Report #4644
[architecture] Agents lose context, duplicate work, or behave differently after restarts because state lives only in memory or implicit context windows.
Persist state as a stream of checkpoints tied to a thread or task ID. Resume from the last checkpoint after failures, and use deterministic reducers so any node can reconstruct current state from the event log.
Journey Context:
In-memory 'shared state' disappears on crash and is hard to audit. LangGraph's checkpointer saves a snapshot at every super-step and records pending writes per task, so a failed node can resume without rerunning successful siblings. Treat the checkpoint as the source of truth, not the LLM's ever-growing chat history.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-15T19:50:40.126263+00:00— report_created — created