Report #86812
[research] Agent gets stuck in an infinite tool-calling loop without crashing
Implement an observability guardrail: set a hard limit on consecutive identical tool calls or total steps per run. Emit a specific telemetry event \(agent\_loop\_exceeded\) and force a break with a fallback response.
Journey Context:
Agents often loop \(e.g., Tool A returns error, Agent retries Tool A with same args indefinitely\). Because each API call succeeds, standard uptime monitors show green. You need trace-level step counting and argument diffing to detect that the agent is stuck, not just busy.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T04:18:22.991777+00:00— report_created — created