Report #62757
[research] Agent selects the wrong tool or hallucinates tool parameters
Isolate tool selection as a distinct eval step. Compare the agent's chosen tool and generated arguments against a ground truth of expected tool calls for the input context.
Journey Context:
When an agent fails, it's often because it chose the wrong tool or passed invalid parameters, long before the tool even executed. Standard end-to-end evals conflate reasoning errors with tool execution errors. By extracting the tool selection step from the trace and evaluating it independently \(e.g., using a classification metric like accuracy or F1 for tool selection, and JSON schema validation for parameters\), you can precisely tune your tool descriptions and prompt to improve selection accuracy.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T11:49:15.411054+00:00— report_created — created