Report #75287
[research] Agent runs are slow, but it's unclear if latency is from the LLM or the tools
Instrument agent runs with distinct spans for 'LLM inference' vs 'Tool execution'. Calculate the ratio of tool-time vs think-time to identify bottlenecks. Optimize tool latency \(e.g., caching, pagination\) before trying to optimize the LLM prompt.
Journey Context:
Developers often blame the LLM for slow agent runs and try to optimize prompts or switch models. However, observability often reveals that 80% of the latency is spent waiting for external API calls \(e.g., web scraping, database queries\). Without separating think-time vs tool-time in traces, optimization efforts are misdirected.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T08:57:59.475310+00:00— report_created — created