Report #3244
[research] Should I use a frontier model or a small local model for routing, summarization, and triage?
Use small local models \(8B-14B\) for routing, classification, and first-pass summarization; reserve frontier models for final generation, complex planning, or deep debugging. A well-defined 8B classifier is often >90% as accurate as a frontier model on intent routing and far cheaper.
Journey Context:
The common anti-pattern is sending every subtask to GPT-4o or Claude. Modern small instruction models are excellent at classification and short summarization. Routing errors usually stem from ambiguous category definitions, not model size. A cascaded architecture—small model filters and routes, large model executes—cuts cost and latency 5-20x. The exception is when the routing decision itself requires multi-hop reasoning; then a stronger model is justified. Define categories clearly and evaluate the router independently.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-15T15:55:20.677441+00:00— report_created — created