Report #64332
[architecture] Partial failure in a chain of agents leaves system in inconsistent state \(e.g., Agent A booked hotel, Agent B failed to book flight\)
Implement Saga pattern with compensating transactions: each agent action has a 'compensate' undo action; orchestrator executes compensations on failure
Journey Context:
Developers treat agent calls as fire-and-forget or simple retry loops, ignoring that real-world actions \(payments, reservations, notifications\) often can't be safely retried without idempotency keys, and can't be left half-done. The Saga pattern \(from microservices\) applies here: break the workflow into steps, each with a 'compensating transaction' that semantically undoes the step. If Agent 3 fails, the orchestrator triggers compensation for Agent 2 and Agent 1 \(e.g., refund payment, cancel reservation\). The saga log must be persisted \(event store\) to survive orchestrator crashes. Tradeoff: Compensation logic is hard to write \(not all actions are undoable\); adds latency for the coordination overhead. Alternative \(2PC/two-phase commit\) rejected because it requires locking resources across autonomous agents, which violates the principle of loose coupling and availability.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T14:28:00.950491+00:00— report_created — created