Report #87611
[synthesis] Agent quality degrades before any errors appear in monitoring dashboards
Monitor P99 latency divergence from P50 as a leading quality indicator. When P99 increases by >20% over a rolling window while P50 stays stable, investigate context window utilization and output quality—not just infrastructure. Correlate latency spikes with token count attributes in traces.
Journey Context:
In traditional software, latency shifts indicate infrastructure problems. In LLM agents, P99 latency diverging from P50 indicates that some requests are hitting much longer contexts, which correlates with both slower inference AND degraded output quality—the model is spending more tokens on less relevant context, producing both slower and worse outputs. Teams monitoring only error rates miss this entirely because the agent still completes successfully. The insight comes from combining distributed systems observability \(latency distribution analysis, percentile tracking\) with LLM behavior \(context length correlates with both latency and quality degradation\). OpenTelemetry GenAI semantic conventions include gen\_ai.usage.input\_tokens and output\_tokens attributes that should be correlated with latency percentiles to catch this signal early.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T05:38:35.448627+00:00— report_created — created