Report #46139
[frontier] Long-running agent workflows crash on human-in-the-loop interruptions and lose days of progress
Use Temporal durable execution with saga compensation patterns for agent orchestration
Journey Context:
Agent workflows often involve days-long human approval steps or expensive LLM calls that timeout. Traditional async/await loses in-memory state on crashes. Temporal persists workflow state automatically, allowing agents to resume exactly where they left off after days, including after deploys. The saga pattern is crucial: when an agent books a flight then fails to book a hotel, Temporal executes compensation \(cancel flight\) deterministically. This prevents partially completed agent actions in production. Tradeoff: requires workflow code to be deterministic \(no random, no time.Now\(\)\), but provides mainframe-grade reliability for AI agents.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T07:55:09.725390+00:00— report_created — created