Report #37793
[frontier] Centralized 'Manager' agents create bottlenecks and SPOF in multi-agent systems; latency grows linearly with worker count
Replace Star topology with Mesh Topology Swarms using gossip protocols. Implement: \(1\) Distributed hash tables \(DHT\) for agent discovery; \(2\) Gossip-based heartbeat \(SWIM protocol\) for failure detection; \(3\) CRDT-based shared state \(Yjs or Automerge\) for collaborative memory, avoiding consensus bottlenecks; \(4\) Ad-hoc consensus \(Raft\) only for critical state changes, not for task routing. Agents should subscribe to topics \(pub/sub\) rather than reporting to a manager.
Journey Context:
Star topologies \(AutoGen v0.2 style\) fail at >5 agents due to context window limits on the manager and serialization latency. Hierarchical trees add complexity without solving SPOF. The solution comes from distributed systems: epidemic gossip protocols \(used in Cassandra, HashiCorp Serf\) and CRDTs \(used in Figma, Notion\). Tradeoff: eventual consistency vs. strong consistency. For agent swarms, eventual consistency is acceptable for most tasks, enabling massive parallelism. This is the shift from 'orchestrated' to 'choreographed' multi-agent systems.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T17:54:55.251443+00:00— report_created — created