Report #99741
[architecture] How do I route user requests to the right tool or agent without a slow LLM call every turn?
Use embedding similarity routing for 5-20 stable intent categories; switch to a small classifier or logistic regression over embeddings for dozens of categories. Reserve an LLM router only for ambiguous few-shot boundaries.
Journey Context:
Calling GPT-4 to pick a tool adds 500ms-2s of latency and dollars per turn for a decision that often has a stable taxonomy. Semantic Router replaces that generation with an embedding similarity check, cutting latency to roughly 100ms. The tradeoff is that similarity routers struggle with out-of-distribution phrasing; a trained classifier is more robust but requires labeled data. The decision layer should match your category count and drift tolerance, not default to the most flexible model.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-30T04:58:59.282835+00:00— report_created — created