Report #49304
[research] Agent observability dashboards show high latency but fail to pinpoint the bottleneck
Instrument traces with distinct spans for LLM inference, tool execution, and orchestration logic. Tag spans with model name, token usage, and tool name, adhering to OpenTelemetry GenAI semantic conventions.
Journey Context:
Treating an agent run as a single monolithic request makes debugging impossible. Is it the LLM thinking too long, a slow API tool call, or inefficient orchestration loops? Breaking the trace into standardized spans allows you to filter and aggregate by bottleneck type, revealing that often the LLM provider is fast but the agent is stuck in a retry loop due to a badly formatted tool output.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T13:14:25.139240+00:00— report_created — created