Report #84945
[synthesis] Agent quality degrades intermittently when primary LLM latency spikes trigger fallback routing
Log the exact model ID and provider routing headers alongside agent outputs. Correlate quality metrics with the actual model served, not the model requested.
Journey Context:
To ensure high availability, production systems often route requests to fallback models if the primary times out or returns 429/5xx. A latency spike causes the router to shift traffic to a smaller model. The agent completes the run, but the logic is flawed. Ops teams see green dashboards \(low latency, no 500s\) while product quality tanks. The root cause is a mismatch between the requested model and the served model, which is only visible in routing logs.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T01:10:07.977571+00:00— report_created — created