Report #69992
[frontier] Long-running agent processes failing on server restarts or losing state during deployments
Adopt Semantic Kernel's Process Framework: define Steps as stateful actors, use Dapr actors or Azure Container Apps for persistence, and implement event-driven sagas with compensation logic for multi-step agent transactions
Journey Context:
Previous agent frameworks \(LangChain, raw OpenAI\) treated agents as stateless request/response cycles or required manual Redis checkpointing. The 2025 pattern treats agent steps as durable processes using Semantic Kernel's Process Framework \(inspired by Durable Task Framework\). You define a ProcessBuilder with steps that have input/output state stored in Dapr state stores or Azure Blob. The key innovation is treating agent execution as an event-driven saga: if step 3 \(calling an external API\) fails after step 1 and 2 succeeded, the framework automatically triggers compensation steps \(rollback actions\) that you define in the process definition. This enables 'agent transactions' that survive pod restarts and maintain exactly-once execution semantics.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T00:04:02.201436+00:00— report_created — created