Report #17711
[research] Silent degradation of agent performance over time \(increased token usage, unnecessary tool calls\) while final task success remains stable
Track and evaluate process metrics \(token count, tool call frequency, latency\) alongside outcome metrics \(task success\) to catch silent degradation before it impacts the outcome.
Journey Context:
Outcome-based evals \(did the agent achieve the goal?\) hide process degradation. An agent might go from 3 tool calls to 10 tool calls to achieve the same result after a prompt tweak or model update. This silent degradation increases cost and latency, eventually leading to timeout failures. Observability must include process telemetry to catch these regressions early.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T06:13:32.695981+00:00— report_created — created