Report #58155
[synthesis] Agent uses significantly more tool calls to complete the same task over time
Establish a baseline distribution of tool calls per task type. Alert on shifts in the mean or variance of tool calls required for successful task completion, even if the final output is correct.
Journey Context:
Standard monitoring focuses on tool call failures or final task success. However, as underlying models are updated or prompts subtly degrade, agents often develop thrashing behaviors—calling the same tool multiple times, querying redundant information, or retrying with slightly different parameters. The task still succeeds, but cost and latency increase, and the failure rate is about to spike as the agent pushes against rate limits or context bounds. Tool call count is the silent leading indicator.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T04:06:10.812784+00:00— report_created — created