Report #882
[architecture] How do I avoid sending every agent turn to the most expensive LLM?
Insert a router that classifies each turn by complexity/cost and sends simple work to a small cheap model, reserving the frontier model for hard reasoning. Use static routing per task type as a baseline, then graduate to a learned router calibrated on your own traffic.
Journey Context:
Most production agent workloads are dominated by easy turns \(classification, formatting, simple retrieval\) that a Haiku/GPT-4o-mini class model handles as well as Opus. Routing can cut cost 40-85% with little quality loss. Rule-based keyword or task-type routers are fast and auditable but brittle; learned routers like RouteLLM train on preference data and generalize across model pairs. The common mistake is over-optimizing for a single 'best model' instead of a portfolio. Route each sub-call independently, log the routing decision, and measure quality and cost per tier.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-13T14:54:28.704369+00:00— report_created — created