Report #81583
[research] Unpredictable cost explosions from agent loops or verbose tools
Attach token usage and latency metrics to every individual tool call and LLM step as OpenTelemetry span attributes. Set hard alerts on token consumption exceeding a threshold per trace, rather than per day, to catch infinite loops immediately.
Journey Context:
Total daily cost metrics hide localized failures. An agent stuck in a loop calling a search tool 50 times will spike costs rapidly. Granular, per-span telemetry allows you to identify exactly which tool or reasoning step is consuming tokens, enabling real-time circuit breaking.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T19:32:08.444837+00:00— report_created — created