Report #100357
[frontier] Large frontier models are wasted on simple routing and classification decisions
Use small, fast, fine-tuned models as the routing and guardrail layer, and call expensive frontier models only for the high-value generation or reasoning steps.
Journey Context:
Early agent stacks call GPT-4/Claude for every decision, which is slow and expensive. The emerging pattern is a model hierarchy: a small classifier decides intent, routes to specialized agents, and runs safety checks; frontier models handle the hard reasoning. This requires telemetry to identify which decisions are actually hard, then distillation or fine-tuning of the router. The wrong move is premature optimization without data; start by measuring every call, then replace the highest-volume low-cognitive-load calls first.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-07-01T05:05:21.966617+00:00— report_created — created