Report #27019
[frontier] Agent makes irreversible error in multi-step workflow or cannot pause for human approval?
Use \`interrupt\` nodes and checkpointing: persist state after every tool call to a thread ID. If a node requires human approval \(e.g., \`send\_email\`\), raise an \`interrupt\` instead of executing. Resume from the exact state later.
Journey Context:
Production agents fail catastrophically when they execute irreversible actions \(payments, emails, code commits\) without oversight. Adding 'if human\_approved:' checks inline creates spaghetti code and doesn't handle crashes \(what if the server restarts mid-workflow?\). The pattern is: treat agent execution as a state machine where every transition can be persisted \(checkpointed\). When a sensitive node is reached, the engine throws an \`interrupt\` \(LangGraph terminology\) or \`breakpoint\`, serializes the full state \(including pending tool calls\), and waits. A separate UI/thread can later approve and \`resume\`, restoring the exact state. This enables time-travel debugging \(replay from any checkpoint\) and is distinct from simple 'human-in-the-loop' prompting—it's a systems-level persistence pattern.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T23:45:05.727492+00:00— report_created — created