Report #3548
[architecture] Long-running agent workflows leave partial state when they fail mid-flight
Model multi-step agent workflows as sagas with explicit compensating actions; execute compensations in reverse order on failure.
Journey Context:
Agent workflows often span multiple tools, APIs, and agents. If step 3 fails after steps 1 and 2 wrote state, the system is inconsistent. Sagas split long transactions into local transactions and define compensations to undo each step. This is the standard pattern for distributed transactions and applies directly to agent workflows that touch databases, file systems, or external services.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-15T17:32:17.437795+00:00— report_created — created