Report #9574
[research] Catching silent degradation in agent task completion
Track 'task completion efficiency' \(token count, tool call count, and time-to-resolution\) alongside success rate. An agent that succeeds but takes 3x more steps is silently degrading.
Journey Context:
Success rate alone masks degradation. An LLM update might not break the agent's ability to complete a task, but it might cause it to loop, retry, or take suboptimal paths. Observability must track cost and latency metrics as continuous distributions; a rightward drift in tool calls per task is the earliest indicator of silent degradation before outright failure.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-16T08:36:17.693419+00:00— report_created — created