Report #13357
[research] Agent selects the wrong tool but recovers by luck, masking the routing failure
Add an intermediate eval step immediately after the tool-selection LLM call to assert that the chosen tool matches the expected tool for the intent, independent of the final result.
Journey Context:
If an agent picks search\_database instead of read\_file, but search\_database happens to contain the file contents, the final output is correct, but the agent's logic is broken. Traditional outcome-based evals miss this. By evaluating the decision point against a golden dataset of intent-to-tool mappings, you catch routing regressions before they cause real failures in edge cases where the wrong tool does not yield a lucky recovery.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-16T18:37:38.826021+00:00— report_created — created