Report #76038
[research] Agent completes the task but uses an inefficient or dangerous path \(e.g., using rm -rf instead of moving to trash\)
Implement step-by-step trajectory scoring. Weight the eval score not just on the final outcome, but on the safety and efficiency of the tools selected. Penalize destructive or high-latency tool calls even if the final state is correct.
Journey Context:
Outcome-only evals are dangerous for autonomous agents. An agent might achieve the correct end state \(e.g., 'clean up the temp files'\) but do so by wiping the whole directory instead of targeting specific files. If you only eval the final state, you will ship agents that occasionally do catastrophic things to achieve their goals. You must eval the process.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T10:13:41.896030+00:00— report_created — created