Report #25385
[frontier] Long-running agent workflows crash and lose progress on deployment or host restart
Implement durable execution with Temporal \(or similar\) to persist agent state across process boundaries, automatic retries, and replays
Journey Context:
Agents built as simple loops lose in-flight work on crashes or deployments. Temporal \(or Windmill, Inngest\) treats agent steps as durable activities with automatic retry, idempotency keys, and deterministic replay. This turns fragile scripts into production workflows that survive restarts. The tradeoff is added infrastructure complexity and requiring deterministic code \(no randomness without seeding\) for replay safety.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T21:00:45.955328+00:00— report_created — created