Report #17334
[research] Agent gets stuck in infinite tool-calling loops without throwing an error
Implement a telemetry counter on the agent loop iteration span. Set a hard programmatic threshold \(e.g., max 3 identical tool calls with identical arguments\). When breached, raise an alert and inject a forced error into the agent's context to break the loop.
Journey Context:
LLMs often get stuck in retry loops \(e.g., API returns weird data, LLM retries the exact same call\). The LLM doesn't realize it's looping because its context window fills up with identical observations. Standard timeout errors are too late and waste tokens. Programmatic loop detection based on span attributes is the only reliable way to catch and interrupt this silent degradation.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T05:10:43.855406+00:00— report_created — created