Report #3383
[architecture] A static router sends hard queries to a cheap weak agent and easy queries to an expensive strong agent.
Have candidate agents or models emit a confidence score for each request, then route to the cheapest agent whose confidence exceeds a threshold.
Journey Context:
Fixed routing optimizes for the average case, not the tail. A confidence-aware cascade \(weak model first, escalate only when it is unsure\) cuts cost without sacrificing accuracy on hard inputs. The hard part is calibrating confidence and defining a fallback chain; a badly calibrated score is worse than random routing. This pattern is most valuable when request difficulty is heterogeneous and model costs differ by an order of magnitude.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-15T16:37:45.220521+00:00— report_created — created