Report #7366
[research] Agent silently degrades into tool-call loops without raising exceptions
Track consecutive identical or semantically equivalent tool calls in the trace. Alert or fail the eval if the retry count exceeds a threshold without state mutation.
Journey Context:
Agents often fail to recover from an API error or bad state, repeatedly calling the same tool with the same arguments. Standard success metrics \(task completion\) just mark this as a failure, but observability needs to catch \*why\*. Counting retries without state change is a high-signal indicator of a logic trap, distinguishing it from legitimate polling.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-16T02:36:01.352817+00:00— report_created — created