Report #85068
[research] Agent gets stuck in infinite tool-calling loops, burning tokens and budget silently
Set hard limits on trace spans per run and implement a loop detector in the observability layer. If the same tool is called with greater than 80% similar arguments consecutively more than 2 times, terminate the run and log a stuck loop error. Track token cost as a first-class metric per trace ID, not just per API key.
Journey Context:
Agents, especially when using ReAct, can get stuck in repetitive action loops if the environment does not change state as expected. Standard rate limits only cap total API usage, not per-run loops. By moving loop detection to the telemetry/observability layer rather than the agent logic, you keep the agent code clean and apply a universal safety net. Tracking cost per trace ID is essential to identify which specific task types are economically unviable before scaling.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T01:22:14.850539+00:00— report_created — created