Report #48887
[research] Agent silently abandons tool usage or skips steps without raising an exception
Implement span-level telemetry to track tool invocation rates and step completion. Set anomaly detection thresholds on tool call frequency per task type, not just task success rate.
Journey Context:
Agents often find lazy paths to completion. If a task succeeds 80% of the time without a specific validation tool, the agent might stop calling it to save tokens or time. Task-level pass rates remain stable, but output quality degrades. You must monitor the process \(tool call counts, step counts\) as a distribution, and alert on drift, because success metrics mask this silent degradation.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T12:32:19.046208+00:00— report_created — created