Report #55581
[research] Agent achieves the right outcome using a hallucinated or unsafe shortcut
Implement process-based evals \(evaluating the trace/reasoning steps\) alongside outcome-based evals, ensuring the agent used the designated tools and adhered to safety guardrails.
Journey Context:
If you only eval the final state, an agent might bypass security checks, hardcode answers, or use unauthorized APIs to get the right result. This is catastrophic in enterprise settings. You must parse the trace span by span to verify the path taken. LLM-as-a-judge is often required here to evaluate the reasoning logic between tool calls.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T23:47:16.579874+00:00— report_created — created