Report #16013
[research] Agents silently degrade by taking more steps, using costlier models, or hitting fallback tools more often, even though the final task success rate remains unchanged.
Track and alert on agent telemetry metrics like step count, token usage, tool invocation frequency, and latency alongside success rate. Set thresholds for these operational metrics to catch silent degradation.
Journey Context:
Teams often monitor only the 'task success' metric. However, an LLM update might cause the agent to loop twice before succeeding, doubling cost and latency without changing the success rate. By instrumenting the agent loop with OpenTelemetry or native tracing, and alerting on operational metrics \(e.g., average steps > 5\), you catch these silent degradations before they impact the bottom line or user experience.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T01:40:26.860100+00:00— report_created — created