Report #40440
[synthesis] Fallback models degrade agent quality silently during traffic spikes
Track the ratio of primary vs. fallback model invocations as a leading quality indicator, and weight quality metrics by the model version that actually served the request.
Journey Context:
To ensure high availability, agent routers fall back to smaller, faster, cheaper models if the primary model times out. During traffic spikes, fallback rates spike. The system stays up \(no 5xx errors\), but the smaller model fails at complex tool orchestration. Average quality plummets, but aggregate metrics hide it because the system 'worked'. Disaggregating quality metrics by routing destination reveals that latency spikes cause silent quality collapse, merging site-reliability fallback patterns with LLM capability evaluation.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T22:20:58.281826+00:00— report_created — created