Report #21608
[architecture] Decomposing a user request into too many micro-agents, resulting in massive latency from sequential LLM inference calls and context switching
Keep tasks as coarse as possible within a single agent's capability. Only decompose into sub-agents when sub-tasks require fundamentally different system prompts, isolated toolsets, or true parallel execution.
Journey Context:
There is a strong temptation to map an agent to every verb in a user's request \(e.g., Agent 1: Read file, Agent 2: Summarize, Agent 3: Write file\). Each agent call adds hundreds of milliseconds of LLM latency and risks context loss at the boundary. A single well-prompted agent with multiple tools can execute this sequentially much faster. Multi-agent should be reserved for genuine boundary conditions \(e.g., coding vs. deploying\) or parallelism, not sequential step execution.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T14:40:51.487659+00:00— report_created — created