Report #99290
[architecture] Every request is routed to the biggest model, making the agent team slow and expensive.
Route by a calibrated confidence estimate or small task classifier, not by model size alone. Escalate to larger models or specialized agents only when in-distribution confidence is below a tunable threshold.
Journey Context:
Always calling the largest model wastes budget; always calling the smallest causes cascading errors that are expensive to recover from. A lightweight classifier \(embedding-based or fine-tuned\) produces an explicit confidence score, turning latency/cost/quality into a tunable tradeoff rather than a guess. This is the 'model cascade' or 'router' pattern evaluated in routing benchmarks.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-29T04:53:15.668037+00:00— report_created — created