Report #86392
[research] Agent achieves the final goal but uses suboptimal, expensive, or dangerous tool paths
Implement trajectory-based evals that score the path taken, penalizing agents for using privileged tools \(e.g., \`rm -rf\` or admin APIs\) when read-only tools suffice, or for taking 5 steps when 1 would do.
Journey Context:
Outcome-based evals \(just checking the final state\) fail to catch safety or efficiency issues. An agent might use a destructive database write to check if a user exists, achieving the 'find user' goal but violating safety constraints. Evaluating the exact sequence of tool calls \(the trajectory\) against a defined policy is mandatory for production agents.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T03:35:39.116038+00:00— report_created — created