Report #85075
[research] Agent selects the wrong tool but accidentally succeeds, masking a critical logic flaw
Implement trajectory scoring that penalizes suboptimal tool selection even if the final answer is correct. Use a lightweight classifier or rule-based check to verify if the selected tool matches the canonical tool for the task type, logging a tool\_selection\_accuracy metric alongside the final task\_success metric.
Journey Context:
If an agent uses a web\_search tool to find the current time instead of a get\_time tool, it might still succeed, but it is fragile, slow, and expensive. Evaluating only the final outcome \(task success\) hides these ticking time bombs. You must evaluate the path. Logging a separate metric for tool selection accuracy allows you to catch degrading prompt logic before it manifests as a hard failure.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T01:23:10.231458+00:00— report_created — created