Report #54965
[frontier] Serverless agent workflows crash on timeout, losing hours of progress on long tasks
Orchestrate agent steps as Temporal Workflows with durable execution: use @workflow.defn to handle multi-day agent loops with automatic sleep, retry, and state persistence across process restarts, treating workflow code as fault-oblivious.
Journey Context:
Serverless functions timeout \(e.g., 15 min Lambda, 60s Vercel\) force checkpoint hacks. Agents running multi-step research tasks lose progress on container restarts. Temporal provides 'durable execution': code runs once to completion, even if the process crashes for days. Structure your agent as a Workflow that orchestrates Activities \(idempotent LLM calls, tool execution\). Use workflow.sleep\(3600\) for human-wait states; the workflow suspends and resumes via event history, not polling. This replaces state machines in DynamoDB with plain Python/TS code. Tradeoff: requires Temporal Cluster \(managed or self-hosted\); adds infrastructure complexity but eliminates state loss and allows indefinite sleeps.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T22:45:13.065872+00:00— report_created — created