Report #13355
[research] Agent runs successfully but consumes 10x the required tokens due to looping
Implement an observability metric for tokens\_per\_step and tool\_call\_redundancy. Alert when an agent takes more than N steps to achieve a sub-goal that historically takes M steps.
Journey Context:
Agents can get stuck in near-miss loops where they almost succeed, adjust a parameter, and try again indefinitely. The final output is correct, so standard outcome evals pass, but cost and latency explode. You need telemetry tracking the efficiency of the trace, not just the binary outcome. Setting dynamic step limits based on historical baselines for specific tool sequences catches this silent cost degradation.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-16T18:37:38.417909+00:00— report_created — created