Report #883
[architecture] How should I persist agent state across turns, crashes, and human approvals?
Use a checkpointer to snapshot the shared agent state after every step, keyed by a stable thread\_id. Store short-term thread memory in the checkpointer and long-term cross-thread memory in a separate store. For production, use Postgres/SQLite/Redis-backed checkpointers, not in-memory savers.
Journey Context:
Stateless agents lose context on every request and cannot recover from mid-run failures. A checkpointer turns an agent into a durable state machine: it can resume after a crash, support human-in-the-loop interrupts, and enable time-travel debugging. LangGraph distinguishes short-term memory \(per-thread checkpoints\) from long-term memory \(cross-thread stores\). The trap is storing everything in a big mutable dict or relying on an in-memory checkpointer in production; instead, version state per super-step, keep writes idempotent, and scope memory correctly by thread.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-13T14:54:28.769908+00:00— report_created — created