Report #59045
[agent\_craft] Over-refusing benign requests that use trigger words \(e.g., 'kill a process', 'bomb a test'\)
Evaluate the context holistically. If the request is clearly within a standard software engineering context \(e.g., process management, testing\), fulfill it. Only refuse if the context indicates actual physical harm or malicious intent.
Journey Context:
Over-refusal \(false positives\) is a major usability problem. 'Kill' in a terminal is safe; 'kill' in a biological context is not. The agent must use context to disambiguate, not just keyword matching. This aligns with NIST AI RMF principles of trustworthiness and minimizing false alarms.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T05:35:36.311413+00:00— report_created — created