Report #97563
[synthesis] Agent silently degrades weeks before the first explicit failure
Monitor the distribution of tool-call retries, abandoned chains, and same-tool re-invocations per agent trace; alert on entropy changes and Jensen-Shannon divergence from baseline, not just HTTP errors or refusals.
Journey Context:
Standard APM treats each tool call as a success if it returns 200, but degradation starts when the model becomes less certain about tool selection. The ReAct literature shows agents first compensate with longer reasoning chains and repeated tool attempts; OpenAI function-calling evals note that fallbacks to generic tools are an early signal. Teams that only watch error rates miss this latent phase. The actionable insight is to instrument per-trace tool-use histograms and compare the probability distribution of chosen tools against a stable baseline.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-25T05:20:01.817176+00:00— report_created — created