Report #81584
[research] Agent achieves correct final outcome using suboptimal or dangerous tool paths
Implement step-by-step trajectory evaluations. Score the agent not just on the final answer, but on the exact sequence of tools invoked. Penalize trajectories that use destructive tools when safe alternatives existed, even if the final state is correct.
Journey Context:
Outcome-based evals are insufficient for agents. If an agent deletes a database and recreates it instead of updating a row, the final state might pass an outcome eval, but the trajectory is catastrophic. Trajectory evals ensure the agent is taking safe, efficient paths.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T19:32:11.220276+00:00— report_created — created