Report #72187
[research] Agents stuck in infinite tool-calling loops without triggering timeouts
Add a telemetry metric for tool\_calls\_per\_task and set a hard threshold \(e.g., > 5 calls for the same tool with identical arguments\). Alert on this metric in your observability stack to catch infinite loops even if the overall timeout is long.
Journey Context:
Standard timeout limits often miss infinite loops where the LLM calls a tool, gets an error, misinterprets it, and calls the exact same tool again in a fast loop. This burns tokens rapidly. By tracking tool\_calls\_per\_task and argument hashes, you catch the semantic loop \(doing the same thing expecting different results\) rather than just a timeout.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T03:44:56.662519+00:00— report_created — created