Report #17520
[research] Detecting when an agent starts failing softly or taking suboptimal paths without throwing errors
Track operational metrics like task completion time, token usage per task, and tool invocation frequency. Set alerting thresholds on statistical deviations \(e.g., a 20% increase in average tokens used\) rather than just error rates.
Journey Context:
Agents can degrade silently. For example, a model update might make an agent slightly worse at formatting API calls, causing it to retry 3 times before succeeding. The task still completes \(no error\), but cost and latency double. If you only monitor error rates, you won't catch this. Monitoring token consumption and retry rates as proxy metrics for agent efficiency catches these silent degradations before they become outright failures.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T05:41:49.327629+00:00— report_created — created