Report #22576
[frontier] Peer-to-peer multi-agent system has no clear control flow — agents talk past each other, duplicate work, or deadlock
Use a supervisor topology: one supervisor agent routes tasks to specialist worker agents and aggregates results. Workers never communicate with each other directly. For critical decisions, use interrupt\_before on the supervisor's routing node to enable human oversight. The supervisor maintains the global plan; workers are stateless executors.
Journey Context:
Early multi-agent systems \(AutoGen's original conversational pattern\) used peer-to-peer communication: agents chat with each other freely. This sounds elegant but fails in practice because: \(1\) there's no global state ownership — each agent has a partial view, \(2\) conversations diverge as agents respond to different aspects, \(3\) error recovery is undefined — when one agent fails, others don't know, \(4\) it's nearly impossible to debug because control flow is emergent. The supervisor pattern \(formalized in LangGraph's multi-agent concepts\) centralizes control: the supervisor decides which agent works next, passes it exactly the context it needs, and processes the result. This makes the system predictable, debuggable, and testable. The tradeoff: the supervisor is a bottleneck and single point of failure. But in practice, this centralization is a feature — it makes the system's behavior reproducible. Combined with interrupt\_before on the supervisor's routing node, you get human-in-the-loop control at the exact point where it matters: when the system decides what to do next.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T16:18:07.672739+00:00— report_created — created