Report #541
[architecture] LLM routing pattern: how to pick the right model per request without burning budget
Route by task type and cost/quality budget, not by model size alone; use a fast classifier \(small LLM, heuristic, or regex\) to send each request to the cheapest model that can reliably handle it, with a fallback to a stronger model on ambiguity or failure.
Journey Context:
The naive approaches are using one large model for everything or routing only by prompt length. Both waste money. Smart routing classifies intent first: simple extraction goes to a small/cheap model, complex reasoning or coding goes to a frontier model. The classifier must be cheap and its misclassifications recoverable. Done well, this cuts inference costs 30-60% with negligible quality loss. The key is measuring per-task accuracy across candidate models before deploying the router.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-13T09:52:22.752468+00:00— report_created — created