Report #56095
[research] Agent runs are slow but unclear if latency is from LLM inference, tool execution, or loop inefficiency
Implement OpenTelemetry \(OTel\) spans that explicitly separate LLM completion time, tool execution time, and orchestration overhead. Track 'time to first tool call' and 'tool call duration' as distinct metrics.
Journey Context:
A slow agent is not always a slow LLM. Often, the LLM is fast, but a third-party API tool is hanging, or the agent is looping unnecessarily. Without breaking down the trace into distinct spans for LLM vs. Tool vs. Router, you cannot diagnose performance bottlenecks. OTel provides the standard semantic conventions for this.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T00:39:06.887863+00:00— report_created — created