Report #49286
[frontier] Agent crashes and loses state on long-running tasks
Implement checkpointed state persistence with thread-scoped memory using LangGraph's PostgresSaver or Redis checkpointer
Journey Context:
Naive agents store state in-memory, losing progress on crashes or restarts. Production agents need durable state machines. LangGraph's checkpointer pattern serializes agent state \(messages, scratchpad\) at each node, enabling crash recovery, human-in-the-loop interruptions, and horizontal scaling. Alternatives like simple JSON files fail on concurrent access. This pattern separates compute from state, allowing spot instance termination without data loss.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T13:12:26.392122+00:00— report_created — created