Report #94426
[research] Uncontrolled agent tool usage causes massive, unattributed API cost spikes
Tag every LLM and tool span in the trace with the initiating agent\_id, user\_id, and task\_type. Set up billing alerts not on total account spend, but on the moving average of tokens per task\_type.
Journey Context:
Agents are stochastic; a slight prompt change can make an agent call a tool 10x more often. Account-level billing alerts are too late. You need granular cost telemetry at the trace level. By attributing token usage to the specific agent/task, you can detect that a specific agent suddenly started consuming 80% of tokens due to a loop, and kill that specific task type, rather than disabling the whole application.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T17:04:47.978464+00:00— report_created — created