Report #16409
[research] Agent gets stuck in infinite retry loops or repair cycles, consuming massive token budgets without failing
Instrument a loop counter or stall detector in the agent's execution trace; trigger an alert or force-fail the run if the agent takes more than 3 identical or semantically similar actions consecutively.
Journey Context:
Agents with self-repair capabilities often encounter a bug, write a fix, fail, revert, and try the exact same fix again. Because the agent doesn't throw a fatal exception, standard error monitoring doesn't catch it. The observability system must track the semantic similarity of consecutive actions or tool calls. If the agent is looping, it must be terminated to prevent runaway costs.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T02:40:08.405149+00:00— report_created — created