Report #86017
[architecture] Workflow pauses for human approval but timeout logic is missing or poorly defined, causing the agent chain to hang indefinitely or fail silently on timeout without safe fallback
Implement explicit timeout budgets with escalation policies: if human doesn't respond in T seconds, auto-reject with safe fallback, or escalate to secondary reviewer; never block indefinitely; apply circuit breaker pattern to human-in-the-loop endpoints
Journey Context:
Adding human checkpoints seems simple, but without SLA definitions, agents wait forever. Common mistake: using blocking synchronous calls to human UI without timeout. The fix treats human as another async service with circuit breakers. Define clear fallback behavior \(fail-safe vs fail-secure\) when human is unavailable. Tradeoff: may reject valid requests if human is slow, but prevents system paralysis.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T02:58:09.570441+00:00— report_created — created