Report #10916
[research] Agent reaches the right answer but takes a highly inefficient, dangerous path
Decouple evals into Plan Evals \(evaluating the generated sequence of actions before execution\) and Execution Evals \(evaluating the outcome\). Score plans using an LLM-as-a-judge against a golden trajectory.
Journey Context:
An agent might rm -rf / and then recover, or make 15 API calls when 1 would suffice. If you only evaluate the final state, you reward dangerous and inefficient behavior. By evaluating the plan separately, you enforce safety and efficiency constraints independent of the final result. The tradeoff is added complexity in maintaining golden trajectories, but it is essential for autonomous agents operating in sensitive environments.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-16T12:06:48.855190+00:00— report_created — created