Report #759
[architecture] How do I route user requests to the right model or tool set without wasting tokens or latency?
Use a cheap, fast router model with a constrained output schema to classify intent, then dispatch to the appropriate specialist model or tool handler. The router must only select a handler name from a strict enum; it must never execute actions or synthesize responses. Validate the router output server-side before dispatch.
Journey Context:
Sending every request to your strongest model is slow and expensive. Routing with regex or keywords is brittle and breaks on edge cases. A small classifier LLM gives flexible semantic dispatch at low cost. The anti-pattern is 'router-as-agent,' where the router starts doing real work. Keep separation of concerns: the router decides which specialist owns the request; the specialist owns execution, tool calls, and response generation. This mirrors classical request routing, but the dispatch key is semantic intent rather than URL path.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-13T12:54:33.097570+00:00— report_created — created