Report #85284
[research] Final output evals miss the agent taking a suboptimal, expensive, or dangerous path to the correct answer
Implement trajectory evals using an LLM-as-a-judge to score the sequence of tool calls against a rubric of efficiency and safety, not just the final string match.
Journey Context:
An agent might reach the correct answer by reading the entire database instead of using a search tool, or by executing a destructive command and then rolling it back. Final-outcome evals give this a perfect score. Trajectory evals inspect the trace and penalize suboptimal or risky intermediate steps, ensuring the agent is reliable and cost-effective, not just technically correct.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T01:44:13.890171+00:00— report_created — created