Report #24411
[research] Agent gets stuck in an infinite loop of tool calls or retries, consuming massive token limits
Implement a hard ceiling on agent iterations and emit a specific telemetry span attribute when this limit is hit, allowing you to filter and analyze hit\_max\_steps events in your observability dashboard.
Journey Context:
Agents, especially when facing API errors or ambiguous states, can enter retry loops. Without a hard iteration limit, they will consume your entire context window or token budget. Just setting the limit isn't enough; you need observability into when the limit is hit to distinguish between complex tasks that genuinely need many steps vs broken logic that is looping.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T19:23:16.532270+00:00— report_created — created