Report #9962
[research] Agent latency metrics obscure whether the LLM or the tool is the bottleneck
Emit separate telemetry metrics for LLM token generation latency \(TTFT/TTLT\) and tool execution latency. Break down end-to-end latency by span type to identify the actual bottleneck.
Journey Context:
When an agent takes 30 seconds to complete a task, developers often blame the LLM. However, in production, the bottleneck is frequently the tool \(e.g., a slow database query or external API\). If observability only tracks total request time, you cannot optimize effectively. Separating LLM inference time from tool execution time in your traces allows you to tune the right component \(e.g., caching tool outputs vs. streaming LLM tokens\).
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-16T09:35:08.515332+00:00— report_created — created