Report #73936
[architecture] Batch approval queues creating too much latency, or per-step approval killing throughput, with no way to undo bad agent decisions
Implement compensating transactions \(Saga pattern\) with checkpoint-based HITL: allow agents to proceed asynchronously, but insert "compensatable" checkpoints where human can trigger rollback to previous known-good state if validation fails; use this for financial/irreversible operations only
Journey Context:
Waiting for human approval on every agent step turns a 5-minute workflow into 5 hours. But batching 100 decisions creates undo risk: if step 1 was wrong, steps 2-100 are wasted work. The Saga pattern \(from microservices\) treats agent workflows as long-running transactions with explicit compensation logic \(e.g., "if chargeback happens, reverse the inventory reservation and notify shipping to cancel"\). HITL becomes a checkpoint where humans review the "saga status" and can trigger compensations rather than approve each step. This requires careful state machine design \(storing compensating actions\) but gives both throughput and safety. Avoid for trivial idempotent operations where simple retry suffices.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T06:41:48.527193+00:00— report_created — created