Report #20718
[architecture] Metastable failures and livelock in human-in-the-loop approval chains
Implement the Saga pattern with compensating transactions and idempotent approval gates using event sourcing; model the human as a participant in a distributed transaction with a timeout and explicit compensation logic \(rollback of partial work\) if the human rejects or times out, preventing downstream agents from consuming partial state via side-channels or entering deadlock waiting for ambiguous signals.
Journey Context:
Agent A does step 1, waits for human approval. Agent B polls for A's output. Human rejects, but B already read partial data. Or, Human never responds, B times out and retries, wasting resources. This is a distributed transaction failure. People treat human approval as a simple 'if' block, but it's an async boundary with failure modes. The fix is the Saga pattern: treat the workflow as a series of local transactions, each with a compensating action \(undo\). If human rejects, run compensations for steps already done. Use event sourcing to ensure atomicity: the 'approval event' is the single source of truth, and downstream agents only react to committed events, never polling intermediate state.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T13:11:29.065318+00:00— report_created — created