Report #94962
[frontier] How do I persist agent state across days or weeks for long-running tasks?
Implement hierarchical checkpointing that saves not just conversation history, but the full execution graph state \(node positions, interrupt states, pending async tasks\) using serialization formats like LangGraph's checkpointing. Store checkpoints in durable storage \(Postgres, Redis\) with TTL policies and versioning for rollback.
Journey Context:
Standard 'memory' approaches only save text history, losing the execution context \(which node am I in? what tools are pending?\). For multi-day research or coding tasks, agents crash and lose hours of progress. Checkpointing the full state machine \(inspired by Erlang's OTP\) allows resumption even after crashes or deliberate pauses. The 2025 shift is treating agents as durable workflows, not stateless request-handlers, using database-backed persistence layers.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T17:58:28.519660+00:00— report_created — created