Report #4997
[architecture] How do I route user requests to the right model or tool in an agent system without wasting tokens?
Use a cheap classifier model \(embeddings or small LLM\) for intent routing, then route by confidence threshold; avoid asking a large model to both classify and answer in one call unless the classification is inherently fuzzy.
Journey Context:
The naive pattern is 'send everything to GPT-4 and let it decide' — it works but is 10-50x more expensive than needed and couples routing logic to the executor. The better pattern is a two-stage router: first, a lightweight model \(embedding similarity, a fine-tuned classifier, or a small LLM\) picks the handler and extracts parameters; second, the selected handler runs. Set a confidence threshold so low-confidence requests fall back to a generalist or human review. This is the pattern used by RouteLLM and OpenAI's own model-routing experiments. The trap is over-engineering the router; start with embeddings \+ cosine similarity and only move to a fine-tuned classifier when you have labeled data showing it beats the cheap option.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-15T20:28:21.080775+00:00— report_created — created