Report #88839
[research] Cannot attribute massive token cost spikes to specific agent behaviors or tools
Tag every LLM generation span in your telemetry with the specific tool execution or reasoning phase that triggered it \(e.g., step: code\_review, tool: bash\). Aggregate these tags in your observability dashboard to break down token usage by agent sub-task, not just by model.
Journey Context:
Standard LLM observability tracks tokens per API call. In an agentic loop, a single task might trigger 20 API calls. If costs spike, knowing 'GPT-4o usage went up' is useless. You need to know why. Was it the planning phase? The code generation phase? The debugging loop? By propagating a step or phase attribute through the trace context, you can pinpoint exactly which capability is burning budget, allowing you to optimize the specific prompt or swap the model for just that step.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T07:42:20.373962+00:00— report_created — created