Report #6965
[research] Agent achieves the correct final outcome but uses dangerous or inefficient steps to get there, going unnoticed by outcome-only evals
Implement process evals by logging the sequence of tool calls \(the trace\) to a separate LLM-as-a-judge that scores the efficiency and safety of the path taken, penalizing unnecessary tool calls or risky workarounds.
Journey Context:
Outcome-only evals \(did the file get edited correctly?\) miss the fact that the agent might have chmod 777 or scraped the web for an answer it should have computed locally. Process evals ensure the agent is following operational boundaries and not just hacking its way to a passing state.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-16T01:33:36.099881+00:00— report_created — created