Report #42890
[research] Agent stuck in infinite loop calling the same tool or bouncing between agents
Implement a circuit breaker on the telemetry layer: set a hard limit on identical consecutive tool calls or total steps per run. Alert on the step count metric hitting the 90th percentile of historical runs.
Journey Context:
LLMs get stuck in retry loops when a tool fails or returns an error the model does not understand. Without observability into step counts, these loops consume massive tokens and costs silently. A simple step limit or consecutive-call limit halts the run, and alerting on step-count anomalies catches subtle loop variations.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T02:27:36.388319+00:00— report_created — created