Report #80128
[architecture] Saga pattern leaving system inconsistent when compensating transactions fail
Design compensating transactions to be idempotent and queue them for indefinite retry with exponential backoff; implement a saga log \(event sourcing\) to record saga state and manual intervention alerts for permanently failed compensations.
Journey Context:
Teams implement sagas thinking that if a step fails, the compensation runs and 'undoes' it. They miss that compensating transactions are themselves distributed operations that can fail \(network partition, downstream service down\). If you compensate a payment refund and it fails, you have charged the customer but not delivered the goods. The hard-won fix: compensations must be retried idempotently indefinitely \(or until manual review\), and the saga must persist its state \(saga log\) to know what needs compensating after a crash. Never assume compensation succeeds on the first try.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T17:05:46.092373+00:00— report_created — created