Report #78432
[research] Agent task completion rate remains high but underlying logic degrades, causing cost spikes
Monitor and alert on token usage per task type as a primary observability metric; a sudden spike in completion tokens often indicates the agent is looping or over-explaining to compensate for lost reasoning capability.
Journey Context:
Success rate is a lagging indicator for subtle model degradation. A model update might make the agent slightly less efficient at following instructions, causing it to take 3 retries or write massive amounts of scratchpad text to solve the same problem. Token count is a highly sensitive, cheap, deterministic leading indicator that something changed under the hood.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T14:14:51.468625+00:00— report_created — created