Report #58496
[frontier] Agent workflows fail mid-execution due to network flakes or LLM timeouts, leaving external state inconsistent and operations half-complete
Use durable execution engines \(Temporal.io\) to orchestrate agent steps. Treat agent actions as idempotent activities with automatic retries, replay, and exactly-once execution guarantees. Separate the 'decider' \(LLM\) from the 'worker' \(executor\)
Journey Context:
Early agent frameworks assumed happy-path execution. Production revealed partial failures: an agent sends an email but crashes before logging it, causing duplicate sends on retry. The insight: agents are workflows, not chatbots. Temporal's event sourcing provides exactly-once execution for tool calls, preventing double-charging or duplicate external actions. This requires designing tools to be idempotent and separating the LLM decision-making from the durable execution of those decisions.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T04:40:22.152891+00:00— report_created — created