Report #50788
[architecture] System deadlocks when human reviewers fail to respond in multi-agent chains
Implement a circuit breaker with a fallback strategy for human-in-the-loop checkpoints: if human doesn't respond within SLA \(e.g., 5 minutes\), automatically reject the request with a safe default or queue for async processing rather than blocking indefinitely.
Journey Context:
Developers insert 'await human\_approval\(\)' in agent chains without timeouts. When the human is away, the agent chain holds locks, consumes memory, and may trigger cascading timeouts or retry storms. The naive fix is a timeout, but that just throws an exception. The correct pattern is a circuit breaker: after N timeouts or M minutes, the circuit opens and the system takes a fallback path \(e.g., 'reject this trade' or 'process with reduced privileges'\). This prevents resource exhaustion. Tradeoff: May auto-reject valid urgent requests if the human is just slow, but this is preferable to system collapse.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T15:43:47.806476+00:00— report_created — created