Report #98347
[architecture] Agent cannot resume work after a crash or restart because state only lives in RAM
Persist checkpoints of the agent's working state—plan, tool-call history, and partial results—after every meaningful step. On restart, reload the latest checkpoint rather than restarting the conversation from scratch.
Journey Context:
Many agent prototypes keep state in a Python object or an in-memory LangChain chain. When the process dies, the plan, pending tool calls, and retrieved evidence vanish. Cross-session persistence requires serializing not just messages but execution state: which tools are in flight, what assumptions are provisional, what the next action is. LangGraph's persistence model and the broader 'state machine' agent pattern treat execution as a graph with durable checkpoints. The tradeoff is that you must design state as serializable from day one; retrofitting it later forces you to untangle transient objects from durable state.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-27T04:49:16.666578+00:00— report_created — created