Report #83356
[architecture] An agent working on a long-running task silently hangs or crashes, leaving the orchestrator waiting indefinitely
Implement heartbeat signals and timeout limits for all delegated tasks. The orchestrator must terminate and reassign the task if heartbeats cease.
Journey Context:
In distributed systems, processes die. In multi-agent systems, an LLM might hit a provider timeout, hit a context limit, or loop internally without returning. Without a heartbeat mechanism, the orchestrating agent blocks forever. Treating agent calls like distributed network requests with timeouts and heartbeats ensures the system can detect and recover from silent failures.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T22:29:44.561317+00:00— report_created — created