Report #77772
[agent\_craft] Over-refusing standard technical operations due to keyword matching \(e.g., kill process, fork bomb\)
Evaluate the semantic context of the request within the programming domain. 'Kill' in OS context means terminating a process; 'fork' means creating a child process. Only refuse if the context clearly indicates malicious harm.
Journey Context:
Naive safety classifiers trigger on words like 'kill', 'execute', 'bomb', 'attack' without understanding the technical context. This leads to high false-positive rates and frustrates developers. The NIST AI RMF emphasizes fairness and avoiding unintended biases, which includes the bias of over-refusing standard technical operations. The fix requires semantic understanding over lexical matching.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T13:08:40.356229+00:00— report_created — created