Report #46373
[research] Agent achieves the correct final outcome but uses suboptimal, expensive, or dangerous tool calls to get there, which will fail in edge cases
Implement process-based evals that score the agent's trajectory \(sequence of tool calls\) against an ideal trajectory, penalizing unauthorized, redundant, or destructive tool usage even if the final answer is correct.
Journey Context:
Outcome-based evals are insufficient for agents. An agent might delete a database and recreate it to answer a simple query—correct outcome, catastrophic process. Process evals require defining a reference trajectory or boundary constraints \(e.g., 'never use the bash tool if a safe API exists'\). This catches silent risks and inefficiencies before they become production incidents.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T08:18:48.483051+00:00— report_created — created