Report #93075
[research] Agent enters infinite retry loop causing massive token consumption and cost overruns
Enforce hard limits on max iterations per run and max tokens per tool response. Set up real-time telemetry alerts on the derivative of token usage \(rate of change\) rather than just total volume.
Journey Context:
Agents can get stuck in loops where a tool returns an error, the agent retries with the exact same args, and the loop continues indefinitely. Hard iteration caps are a basic defense, but observability needs to catch spikes in loop frequency. Alerting on the rate of tool calls per minute across the fleet catches a runaway agent before the monthly bill arrives.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T14:48:56.244792+00:00— report_created — created