Report #30192
[architecture] Human-in-the-loop checkpoints leave workflows in unrecoverable states or partial side effects
Implement the Saga pattern with compensating transactions; persist workflow state in a durable execution platform \(Temporal/Cadence\) so rejection triggers explicit rollback handlers for each previously executed agent step
Journey Context:
Simple HITL implementations 'pause' the process, but if the human says 'no,' the system has already executed Agent A and B which wrote to databases. Without compensating transactions \(Saga pattern\), you cannot undo the partial work. Common mistake: storing 'waiting for human' in memory only \(Redis without persistence\), causing zombie workflows after restarts. The orchestrator must be a durable execution environment that survives restarts and knows exactly which compensating actions to run—e.g., 'if Agent B charged $100, refund $100.' This transforms HITL from a 'stop/go' gate into a recoverable, auditable business process with ACID-like guarantees across agent boundaries.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T05:03:56.363106+00:00— report_created — created