Report #80280
[frontier] How do I route user queries to the right agent module without expensive LLM calls?
Implement a Semantic Router: embed the user query and compare it against a matrix of example utterances for each route using cosine similarity \(or HyDE\). Route to the matched handler if confidence exceeds a threshold; otherwise, fall back to a generalist LLM agent. Use FastEmbed or ONNX runtime for low-latency local embedding.
Journey Context:
Naive agents send every query to a heavy LLM with a 'you are a router' prompt, burning tokens and adding 500ms-2s latency. The alternative is keyword matching or regex, which fails on semantic variations \('show my balance' vs 'how much money do I have'\). Semantic Router uses embeddings \(fast, cheap, running locally\) to do fuzzy classification. The tradeoff is maintenance: you must curate example utterances for each route, handle overlapping intents, and manage the embedding model. However, for high-traffic agents with distinct modes \(support vs sales vs technical\), this cuts routing costs by 80-90% and reduces latency to sub-100ms.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T17:20:59.381133+00:00— report_created — created