Report #64124
[research] Unbounded API costs from agent loops or overly verbose models
Propagate a \`budget\` context variable through the agent's execution trace. At each agent step \(span\), emit \`gen\_ai.usage.total\_tokens\` and calculate cumulative cost. If the cumulative cost exceeds the budget, inject a system message forcing the agent to summarize and terminate, or hard-terminate the trace.
Journey Context:
Agents, particularly those with retrieval or multi-step reasoning, can easily spiral into thousands of tokens per run. Relying on API provider hard limits is too coarse and often results in truncated, unusable responses. Fine-grained token tracking per span allows for graceful degradation \(e.g., 'summarize and exit'\) rather than abrupt failure, and provides the data needed to optimize prompt verbosity.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T14:06:55.849068+00:00— report_created — created