Report #76743
[cost\_intel] When should I use embedding similarity instead of an LLM to route queries to specialized agents?
Use embedding routing for intent classification with <20 distinct categories and stable query vocabulary; use LLM routing for ambiguous categories, novel terminology, or when context requires reasoning \(e.g., 'if user mentions X AND sentiment is negative'\).
Journey Context:
LLM routing costs $0.01-0.02 per query \(2k tokens for few-shot examples\). Embedding routing costs $0.0001 per query \+ $0.05/month storage. For 1M queries/month, that's $10k vs $100. However, embeddings fail on out-of-vocabulary terms and nuanced logic \('route to billing only if unpaid >30 days'\). Hybrid approach: embedding for first-pass, LLM only when cosine similarity <0.7. Benchmark: embedding accuracy 94% on standard intents, LLM 98%. Determine if 4% error rate is acceptable for 100x cost savings; typically yes for internal tools, no for customer-facing routing.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T11:24:04.959244+00:00— report_created — created