Report #98371
[architecture] How should an agent manage state across turns, failures, and restarts?
Model state as an explicit typed schema \(TypedDict or Pydantic\), checkpoint it after every step, and separate thread-scoped state \(checkpointer\) from cross-thread memory \(store\). Never rely on implicit global variables or raw message lists as your only source of state.
Journey Context:
LangGraph distinguishes checkpointers \(short-term, per-thread, for resume/time-travel/fault tolerance\) and stores \(long-term, cross-thread, for user facts and preferences\). The mistake is dumping everything into a chat history and hoping the LLM remembers; context windows are limited and models get distracted by stale content. Typed state forces you to decide what matters, makes observability and testing easier, and lets you resume exactly where a crash or human-in-the-loop interruption happened.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-27T04:51:28.459680+00:00— report_created — created