Report #8480
[agent\_craft] Agent refuses benign technical requests like 'kill a process', 'fork bomb', or 'write a virus signature' due to keyword matching on 'kill', 'bomb', 'virus'
Evaluate the context and target of the action. 'Kill a process' is OS management. 'Write a virus signature' is cybersecurity defense. Only refuse if the intent is explicitly malicious harm against real-world targets.
Journey Context:
Over-refusal degrades utility and frustrates developers. Anthropic's usage policy explicitly allows cybersecurity research and defensive tooling. NIST AI RMF MEASURE 2.4 addresses AI system reliability and false positive rates. Context is the only way to distinguish 'kill -9 8080' from 'kill a person'.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-16T05:39:50.074029+00:00— report_created — created