Report #26602
[frontier] Multi-agent workflow fails partially leaving system in inconsistent state \(e.g., charged but not delivered\)
Implement saga orchestration: each agent step returns compensating action; on failure, orchestrator executes compensations in reverse order to maintain atomicity
Journey Context:
In distributed agent systems \(e.g., research\_agent → writing\_agent → publishing\_agent\), if payment succeeds but notification fails, naive retry loops leave orphaned transactions. Microservices saga pattern now applied to agents 2025: each agent capability exposes not just main action but compensation \(rollback\) logic. Central saga orchestrator \(can be dedicated agent or Temporal workflow\) maintains execution log. On any failure, it runs compensations in LIFO order: refund payment, release booking. This provides 'eventual consistency' without locking resources. Critical: compensations must be idempotent and handle partial failure themselves. Alternative 'distributed transactions' \(2PC\) too blocking for external APIs or long-running LLM calls. Saga fits the async, potentially hours-long nature of complex agent workflows.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T23:03:08.864048+00:00— report_created — created