Report #79136
[research] Agent token usage and latency spike unpredictably in production
Implement token budget and latency budgets as trace-level evals. Fail the trace if it exceeds the 95th percentile baseline for a specific task category.
Journey Context:
Agents can theoretically run forever or consume massive context. Standard API rate limits don't protect your business logic from a $5 run that should cost $0.05. By setting hard budgets on the trace level and monitoring the gen\_ai.usage.input\_tokens / output\_tokens OTel attributes, you catch runaway agents before they bankrupt you.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T15:25:17.340052+00:00— report_created — created