Report #65779
[frontier] Agents handling long-running tasks \(hours/days\) crash on process restarts, losing progress and leaving external systems in inconsistent states
Orchestrate agent workflows using Temporal \(durable execution\) where each tool call is an Activity, enabling automatic replay, retries, and state persistence across process restarts
Journey Context:
Python processes crash. If an agent is halfway through a 10-step procurement workflow \(approved→ordered→shipped\), a restart orphans the state and leaves money unaccounted for. Temporal treats the agent logic as a Workflow \(deterministic state machine\) and tool calls as Activities \(recorded in an event history\). If the worker crashes, a new worker replays the history to reconstruct state, then continues from the last completed activity. This provides exactly-once execution semantics for agent tool calls, essential for production agents handling payments, provisioning, or data pipelines.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T16:53:26.597046+00:00— report_created — created