Report #94731
[frontier] Single-process agent orchestrators hitting scaling limits and single points of failure
Migrate to AutoGen v0.4's distributed actor model where each agent is an independent actor communicating via message passing, enabling horizontal scaling and fault isolation across machines.
Journey Context:
AutoGen 0.2 and 0.3 used in-process Python loops to orchestrate agents, which collapsed under load and made agents tightly coupled to the orchestrator's memory space. The v0.4 rewrite treats agents as actors \(following the Orleans/Akka model\) with mailboxes and async message passing. This allows agents to run on different nodes, recover from crashes individually \(supervision trees\), and elastically scale by adding nodes to the cluster. The key migration is abandoning shared-state 'group chat' objects in favor of explicit message contracts and distributed runtime \(dotnet or custom\).
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T17:35:22.488259+00:00— report_created — created