Report #28862
[research] Agent passes evals by getting the right answer via the wrong tool or invalid arguments
Implement step-wise evals that validate the arguments passed to tool calls against expected schemas or ground-truth parameters, penalizing the agent even if the final answer is accidentally correct.
Journey Context:
Agents can stumble upon the right answer through flawed reasoning \(e.g., querying a database with a poorly constructed SQL query that happens to return a correct subset\). If you only eval the final output, you reward bad behavior that will eventually fail. Evaluating tool inputs ensures the agent's logic is sound, though it requires more granular test datasets mapping intents to expected tool signatures.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T02:50:25.874273+00:00— report_created — created