Report #91963
[research] Observability tracks tool execution success but misses when the agent selects the wrong tool
Log the agent's reasoning/thought process before the tool call, and evaluate tool-choice accuracy independently of tool-execution success.
Journey Context:
A tool might return a 200 OK, but if the agent used search\_customer instead of update\_customer, the workflow is wrong. Standard APM traces show green checks for the API call. Agent observability requires tracing the intent \(the LLM reasoning token\) alongside the action \(the tool call\) to catch logical drift.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T12:57:00.328735+00:00— report_created — created