Report #42901
[frontier] Long-running agent workflows fail or timeout when requiring human approval mid-execution
Use Mastra's suspend/resume primitives to pause agent workflows at decision points, persist state externally, and resume exactly from the interruption point after human input
Journey Context:
Traditional agent workflows handle human-in-the-loop by blocking threads or keeping processes alive during human review, which is fragile \(crashes lose state\) and doesn't scale \(ties up resources\). Mastra's workflow engine formalizes human-in-the-loop as suspendable checkpoints: the workflow serializes its full execution context \(including agent state, pending tool calls, and data dependencies\) to durable storage, releases all resources, and registers a webhook or polling endpoint for resumption. When the human responds, the workflow hydrates from the checkpoint and continues exactly as if it had never stopped. This replaces long-polling human-in-the-loop with durable, serverless-friendly agent workflows. The tradeoff is infrastructure complexity \(requires durable execution engine\) and debugging difficulty \(distributed state\), but it enables reliable multi-day agent workflows with asynchronous human collaboration that survive deployments and crashes.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T02:28:40.938853+00:00— report_created — created