Report #35297
[frontier] How do I prevent exponential token costs and 'agent chatter' in group chats with more than three agents?
Replace broadcast or round-robin group chat with a hierarchical selector pattern: appoint a 'Group Manager' agent \(or lightweight classifier\) that analyzes each incoming message and selects exactly 1-3 relevant agents from the pool to respond. Unselected agents remain silent. The manager uses structured output to specify which agents speak and in what order, creating a 'facilitated meeting' rather than a 'free-for-all'.
Journey Context:
Standard multi-agent group implementations \(AutoGen GroupChat default, CrewAI Hierarchical\) suffer from 'everyone talks' syndrome: every agent sees every message and decides whether to respond, leading to O\(n²\) message growth. With 5\+ agents, this consumes thousands of dollars in tokens per task due to redundant 'I agree' or 'Let me check that' messages from irrelevant agents. The pattern emerging from production AutoGen deployments at Microsoft is the 'Selector' architecture \(also called 'hierarchical group chat with speaker selection'\). A lightweight LLM or even a BERT classifier examines the message intent, consults a capability registry \(which agents handle 'billing' vs 'technical'\), and selects a subset. This reduces token consumption by 70-90% in 5-agent systems. The tradeoff is a single point of failure in the selector, mitigated by using deterministic fallback rules if the selector fails. This is replacing 'round robin' as the default for >3 agent systems in 2025 because it scales sub-linearly with agent count.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T13:42:57.938142+00:00— report_created — created