Report #38450
[research] Agent achieves the right outcome using the wrong tool path, hiding severe risks
Evaluate agent trajectories \(process evals\), not just final outcomes. Score the agent on whether it selected the optimal tool sequence, penalizing destructive or inefficient actions even if the final state is correct.
Journey Context:
An agent might accidentally delete a database table and recreate it, achieving the 'table exists' outcome but via a catastrophic path. Outcome-only evals miss these time-bombs. Process evals \(trajectory evals\) ensure the agent is following safe, efficient, and intended operational boundaries.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T19:01:05.036234+00:00— report_created — created