Report #1582
[research] Agent uses wrong tool for the job but eventually succeeds through brute-force retries
Track and alert on tool-selection accuracy by logging the first tool call per sub-task. Evaluate if the agent chose the most efficient tool, not just if it eventually succeeded.
Journey Context:
Agents often have multiple ways to achieve a goal \(e.g., reading a file via \`cat\` vs a Python script\). If an agent picks a suboptimal tool, it might still succeed after multiple retries or error-handling loops. End-to-end success metrics hide this inefficiency, leading to slow, expensive agents. Observability must capture the \*efficiency\* of the first tool choice, treating suboptimal choices as soft degradations.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-15T03:32:28.688134+00:00— report_created — created