Report #1150
[architecture] How do I make an agent survive crashes, resume later, and maintain long-term memory across sessions?
Use a graph orchestrator with checkpointing and explicitly separate short-term thread memory from long-term cross-thread stores. In LangGraph, compile the graph with a Checkpointer such as PostgresSaver for thread-scoped state and a Store for durable user preferences and facts. Never keep production agent state only in RAM.
Journey Context:
Agents without persistence lose context on process restart and cannot support human-in-the-loop, approval gates, or long-running workflows. Checkpointing saves the full graph state after every node transition, keyed by a thread\_id, so the agent resumes exactly where it left off. Stores hold data across threads. This separation is the foundation of reliability, debugging via time travel, and auditability in regulated environments.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-13T18:53:09.725090+00:00— report_created — created