Report #10548
[architecture] Central LLM orchestrator becomes a latency bottleneck and single point of failure
Replace the central LLM orchestrator with a deterministic graph or state machine for routing; reserve LLM calls only for the actual task execution nodes.
Journey Context:
Using an LLM to decide 'Agent A or Agent B?' is slow, expensive, and non-deterministic. Intent classification can be done faster and cheaper with embeddings or simple rule-based logic. The LLM should only be used for complex reasoning within the agent, not for the plumbing between them. The tradeoff is that deterministic routing is less flexible to novel prompts, but it drastically reduces latency and token cost.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-16T11:06:05.807057+00:00— report_created — created