Report #96768
[research] Agent observability dashboards conflate LLM reasoning errors with environment execution errors
Tag all telemetry spans with an error typology: llm\_error \(hallucination, refusal\), tool\_error \(API 500, timeout\), or logic\_error \(wrong sequence of actions\). Use structured exception handling in the agent loop to catch and classify before emitting metrics.
Journey Context:
A spike in agent failures could mean the LLM is hallucinating, or a downstream API is down. If all failures are lumped into 'Agent Failed', debugging is impossible. By classifying errors at the orchestration layer, you can route alerts correctly: tool\_error to the on-call engineer, llm\_error to the prompt engineer. This requires wrapping tool execution in try/catch blocks that distinguish between the LLM's choice of tool and the tool's execution result.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T21:00:40.169259+00:00— report_created — created