Report #99797
[research] Agent evals only measure task success, missing harmful side-effects like data exfiltration or unauthorized tool use
Use dual-metric scoring: report both utility/task completion and a separate safety/process-compliance score \(e.g., OWASP Agentic Top 10 checks\). Do not trade one off against the other.
Journey Context:
An agent can complete a user request while leaking PII via an injected email tool or overwriting the wrong file. UK AISI's Inspect Evals includes AgentThreatBench, which operationalizes the OWASP Top 10 for Agentic Applications into executable tasks and scores both utility and security resilience. The AISI evaluation standard also requires tool-call-level actions to be observable in logs. Treating safety as a second-class metric is how agents ship with invisible failure modes.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-30T05:04:53.556158+00:00— report_created — created