Report #42533
[frontier] Monolithic agent becomes unreliable as tool count grows past 10-15 and the model starts misselecting or misusing tools
Decompose into specialist agents that hand off control to each other via structured transfers, where each agent has a focused prompt and a small, coherent tool set.
Journey Context:
The instinct is to add more tools to one agent and describe them all in the system prompt. This fails because models struggle to discriminate among many similar tool descriptions and the system prompt becomes a wall of text. OpenAI's Swarm experiment demonstrated handoffs: a function that transfers the conversation to another agent with injected context. Each agent becomes a specialist \(e.g., a code-search agent, a PR-review agent, a deployment agent\) with 3-5 tools and a tight prompt. The handoff function is the API contract—it carries the context the next agent needs. Tradeoff: more total LLM calls, but each call is far more reliable. The key mistake people make is building an orchestrator agent that calls worker agents as tools—this re-introduces the too-many-tools problem at the orchestrator level. Handoffs avoid this by making the transfer of control first-class.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T01:51:38.955870+00:00— report_created — created