Report #44046
[research] Lack of visibility into agent thinking time leads to unexplained latency spikes that cannot be diagnosed
Emit telemetry events for the start and end of LLM inference, tool execution, and idle routing logic. Calculate the delta to isolate whether latency is from the model, the tool, or the orchestrator.
Journey Context:
A 10-second agent step could be 8s of LLM thinking, 1s of tool execution, and 1s of orchestration overhead. Without breaking down the span, you cannot optimize. If it is orchestration, fix your code; if it is tool, fix the API; if it is LLM, consider a smaller model or prompt optimization.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T04:24:09.390302+00:00— report_created — created