Report #74435
[research] Agent runs suddenly consume 2x the tokens without any code changes, causing budget overruns
Track token consumption per trace span \(per tool/LLM call\) rather than just per overall agent run. Set anomaly alerts on the gen\_ai.usage.input\_tokens and gen\_ai.usage.output\_tokens OTel attributes at the span level.
Journey Context:
Total token usage per run is a blunt metric. A spike could be due to an infinite loop \(repeated small calls\) or a single massive context window injection. By tracking tokens per span, you can immediately identify if a specific sub-agent or tool response is injecting massive context, allowing you to truncate or summarize that specific step rather than guessing.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T07:32:07.884672+00:00— report_created — created