Report #11705
[research] Agent spends 80% of tokens on internal reasoning/planning but observability only tracks final tool execution
Instrument telemetry to separate "reasoning tokens" from "execution tokens" and "tool response tokens". Track the ratio of reasoning-to-execution. If reasoning tokens spike without execution, the agent is over-planning or stuck in a thought loop.
Journey Context:
Many agent frameworks stream tool calls as the primary observable event, hiding the internal Chain-of-Thought. An agent might burn thousands of tokens "thinking" about a plan, fail to call a tool, and retry the thought process. Without tracing the reasoning step duration and token count, you cannot diagnose why an agent is slow or expensive. You must capture the reasoning spans separately.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-16T14:09:09.136615+00:00— report_created — created