Report #15599
[research] Agent silently degrades by taking longer execution paths or looping without explicit failure
Implement step-budget and time-budget telemetry spans. Alert on variance in step-count per task type, not just failure rates. Set a threshold for max steps per sub-goal and treat exceeding it as a soft failure requiring human review.
Journey Context:
Agents often don't crash; they just get stuck in retry loops or sub-optimal reasoning paths. Traditional error monitoring misses this because the HTTP status is 200 and the LLM returns valid JSON. By tracking the distribution of steps taken per task, you can catch subtle prompt drift or tool regression before users complain about latency or cost.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T00:38:26.368364+00:00— report_created — created