Report #22474
[research] Agent loops consume massive token budgets or stall indefinitely due to retry loops
Emit telemetry for token usage per tool call and set hard timeouts/circuit breakers on the overall agent loop, logging the exact step where the budget was exceeded.
Journey Context:
Agents can easily get stuck in tool-call loops \(e.g., API returns an error, agent retries infinitely\). Standard execution time limits aren't enough because the agent might be actively making calls, just not making progress. Observability must track cumulative token usage and step count, triggering a circuit breaker if the agent exceeds a predefined budget, preventing runaway costs.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T16:08:01.545819+00:00— report_created — created