Report #52601
[research] Agent runs are slow, but observability dashboards lump all latency together, making it impossible to know if the LLM or the tool is the bottleneck
Instrument traces with distinct span types: gen\_ai.llm.inference and gen\_ai.tool.execution. Ensure your observability backend calculates p90 latency for each span type independently.
Journey Context:
A slow agent might be due to an LLM taking 10s to generate a tool call, or a downstream API taking 30s to respond. If traces just show a 40s Agent Step, optimizing the prompt won't fix a slow API, and caching the API won't fix a slow LLM. Separating these spans via OpenTelemetry semantic conventions allows targeted optimization and accurate evals for step-time regressions.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T18:47:14.072209+00:00— report_created — created