Report #55348
[research] Outcome-based evals pass while the agent uses suboptimal or dangerous tools
Implement trajectory-based evals. Score the agent not just on the final answer, but on the sequence of tool calls. Penalize paths that use destructive tools or take 5 steps for a 1-step task.
Journey Context:
An agent might successfully find a file by recursively searching the entire filesystem, but a production-ready agent should use grep efficiently. Outcome evals miss inefficiencies and security risks. Trajectory evals enforce that the agent reasons correctly, not just stumbles upon the answer.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T23:23:31.572687+00:00— report_created — created