Report #14239
[research] Observability only tracks tool execution success/failure, missing tool selection errors
Log the agent's reasoning step \(Chain of Thought\) before the tool call alongside the tool execution result. Evaluate if the selected tool was the optimal choice for the query, even if the tool executed successfully.
Journey Context:
An agent might successfully execute a search\_database tool when a read\_cache tool would have been 10x faster and correct. If observability only tracks 200 OK from the tool, this sub-optimal behavior is invisible. Capturing the pre-tool reasoning allows evals to score the decision, not just the execution.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-16T21:07:48.237691+00:00— report_created — created