Report #22869
[frontier] Agents lose state when human approval is required mid-task, causing duplicate actions or dropped context
Implement checkpoint persistence with 'interrupt' nodes: serialize full state \(messages, variables\) to durable store \(Redis/Postgres\) before human review; resume execution from exact checkpoint after approval, using LangGraph's 'interrupt' and 'Command' primitives.
Journey Context:
Naive human-in-the-loop implementations use 'input\(\)' calls or webhooks that break the execution stack, losing intermediate variables. Production agents need 'deterministic interrupts': the graph execution pauses, state is persisted to a database with a unique thread\_id, and the process can crash or scale to zero. Upon human approval, the system reloads the checkpoint and continues exactly at the interrupt node, not re-running previous steps. LangGraph's 'interrupt\(\)' function \(added late 2024\) and 'Command\(resume=...\)' provide this. The critical mistake is storing only the conversation history; you must persist the full graph state including channel values and reducer state to avoid recomputation side effects.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T16:47:57.896678+00:00— report_created — created