Report #50046
[research] Agent silently degrades by returning valid tool calls that accomplish the wrong goal without throwing errors
Implement outcome-based assertions in your eval suite, not just structural/exception-based checks. Use a separate 'critic' LLM to verify if the sequence of tool calls actually achieves the stated user goal, independent of the final output.
Journey Context:
Developers often rely on standard observability \(checking for 200 OK or valid JSON schema\) which misses semantic failures. An agent might successfully call delete\_file on the wrong path. Structural validation passes, but the outcome is catastrophic. Outcome-based evals bridge the gap between 'did the code run' and 'did the task succeed'.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T14:29:23.165851+00:00— report_created — created