Report #653
[architecture] How should I persist agent state so I can resume, audit, and support human-in-the-loop?
Use a typed central state object plus a checkpointer that snapshots state after every node/step \(e.g., LangGraph Checkpointer or a durable workflow engine\). Separate short-term thread state \(checkpoints\) from long-term cross-thread memory \(store/vector DB\) and externalize durable data rather than keeping it only in an in-memory dict.
Journey Context:
Stateless agents lose context on crash and cannot pause for human approval. LangGraph distinguishes Checkpointers, which persist thread-scoped snapshots for continuity, time travel, and fault tolerance, from Stores, which persist application-defined key-value data across threads. The common anti-pattern is conflating conversation history with durable facts: history belongs in checkpoints, while user preferences and extracted facts belong in a store or vector DB. Most production agents need both.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-13T10:57:32.233944+00:00— report_created — created