Report #90750
[research] Agent waits indefinitely or times out during a human-in-the-loop approval step, causing the orchestration to fail or lock resources.
Instrument the approval step with a telemetry span that has a strict timeout attribute. Emit a metric for \`approval\_latency\`. If the timeout is reached, the agent must execute a pre-defined fallback \(e.g., cancel the operation, notify a chat channel\) rather than hanging.
Journey Context:
When agents require human approval for high-stakes actions \(e.g., deploying to prod\), developers often just pause the thread or use a blocking API call. If the human walks away, the agent hangs, consuming resources or timing out the orchestration layer. The fix is treating human approval as an asynchronous event with a TTL. The tradeoff is that defining fallbacks for every approval step is tedious, but it prevents zombie agent processes.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T10:55:20.071325+00:00— report_created — created