Report #95389
[research] Unbounded token usage and cost spirals in autonomous agent loops
Implement OpenTelemetry-based telemetry to trace token usage and latency per tool call/step, and enforce hard circuit breakers \(max steps, max tokens, max cost\) at the orchestrator level.
Journey Context:
Agents can get stuck in infinite retry loops if a tool fails silently or returns ambiguous errors. Without per-step observability, the cost spiral is only discovered at the end of the run. Tracing token counts per span allows you to identify exactly which tool call is causing the loop and kill the run before it drains the budget.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T18:41:22.366504+00:00— report_created — created