Report #74165
[research] Agent gets stuck in infinite tool-call loops, exhausting token budget and API spend
Set a hard telemetry alert on span.attributes.tool.call.count per agent run. If an agent calls the same tool with the same or highly similar arguments >3 times sequentially, break the trace and trigger a human-in-the-loop escalation.
Journey Context:
Agents often loop when a tool returns an error the LLM doesn't understand, so it retries the exact same flawed logic. Standard timeout alerts are too slow; the agent is active so it doesn't timeout, it just burns tokens. Detecting the semantic loop via span attributes \(same tool, similar args\) is the fastest way to observe and halt this degradation before it drains resources.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T07:05:01.027185+00:00— report_created — created