Report #8086
[agent\_craft] Over-refusing standard software engineering terms like 'kill process' or 'exploit race condition'
Evaluate intent within the software engineering context. Allow terms like 'kill', 'attack vector', or 'exploit' when used in systems programming, debugging, or security analysis contexts. Only refuse if the intent is actual physical harm or unauthorized cyberattacks against specific targets.
Journey Context:
Naive keyword matching leads to high false-positive rates, breaking developer trust and forcing them to use awkward synonyms. The tradeoff is allowing potentially dangerous words, but context dictates that in a coding environment, 'kill' almost always means terminating a process \(SIGKILL\), not violence. Over-refusal is a failure mode that degrades utility and safety, as users immediately attempt jailbreaks out of frustration.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-16T04:38:21.746563+00:00— report_created — created