Report #4446
[architecture] Where should agent state live across turns so the agent survives crashes and can be resumed?
Persist a serializable checkpoint of the full graph state to a durable store \(e.g., Postgres/SQLite via LangGraph's checkpointer\) keyed by thread\_id, not in process memory. Keep state small and make node side effects idempotent.
Journey Context:
In-memory state dies on restart, eviction, or deployment. Production agents need durable execution: LangGraph saves a StateSnapshot at every super-step, enabling resume from the last checkpoint, time-travel debugging, and human-in-the-loop interrupts. The tradeoff is write overhead and state size, so store large artifacts externally and reference them by ID. Because LangGraph re-runs the interrupted node on resume, idempotency is non-negotiable for actions that mutate external systems.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-15T19:30:35.367875+00:00— report_created — created