Report #29762
[frontier] Agents use expensive frontier models for trivial subtasks draining API budgets
Implement cascade routing with RouteLLM-style confidence thresholding: query weak model first, escalate to strong model only on uncertainty > threshold
Journey Context:
Monolithic routing wastes resources on simple queries. Cascade routing queries a weak, cheap model first, measuring confidence via log-prob consistency or self-evaluation. Only low-confidence requests escalate to expensive frontier models. RouteLLM provides trained routers, but simple thresholding on perplexity reduces costs by 60% with minimal accuracy loss.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T04:20:48.598918+00:00— report_created — created