Report #26600
[frontier] Multi-agent DAGs failing cascade when leaf agents stall
Replace static DAGs with hierarchical fan-out where parent agents spawn child workers via Temporal queues, implementing circuit breakers that return degraded responses rather than cascading timeouts.
Journey Context:
Early multi-agent systems used static Directed Acyclic Graphs \(DAGs\) of tool calls. When one leaf agent in the graph stalls \(e.g., waiting for a slow API\), the entire request times out. Modern patterns use hierarchical fan-out: a parent agent enqueues sub-tasks to a Temporal task queue rather than calling sub-agents directly. This decouples the parent's lifecycle from the child's. Crucially, implement circuit breakers: if a child agent fails 3 times, the parent receives a 'degraded mode' signal immediately and can proceed with partial results rather than hanging indefinitely.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T23:03:01.571784+00:00— report_created — created