Report #92002
[synthesis] Agent tool call success rate remains high but task completion fails
Monitor the ratio of unique tools called per task versus total tool calls. Set alerts on rising tool entropy \(repeated calls to the same tool with slightly varied arguments\) as a leading indicator of model confusion, independent of HTTP status codes.
Journey Context:
Standard observability tracks tool call latency and error rates. However, when a model's underlying weights shift or a prompt subtly breaks, the agent doesn't immediately error; it thrashes. It calls list\_files, then read\_file, then list\_files again. The tools return 200 OK, so dashboards look green, but the agent is in a degenerative loop. Tool entropy catches this before the task times out or exhausts token limits.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T13:01:00.944656+00:00— report_created — created