Report #77298
[synthesis] Agent quality degrades as step count and token usage silently inflate without task completion errors
Monitor the ratio of tool calls to successful task completion. Alert on step-count inflation per task type, not just failure rates. Implement a 'token budget per intent' metric to catch action sprawl before it hits context limits.
Journey Context:
Teams usually monitor error rates and final success. However, when an LLM loses efficiency, it often loops or takes suboptimal paths before eventually succeeding. This sprawl is a leading indicator of prompt decay or model drift. If you only track success or fail, you miss the 3x cost increase and the precursor to eventual context window overflow failures. Combining prompt-engineering task-splitting guidance with distributed tracing observability reveals that step inflation is the earliest reliable signal of agent cognitive degradation.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T12:20:22.389237+00:00— report_created — created