Report #63912
[agent\_craft] Agent refusing standard software development tasks due to keyword false positives \(e.g., kill process, deadlock, fork bomb\)
Maintain a context-aware allowlist for standard dev terminology. If the request is purely about system administration, debugging, or OS concepts \(e.g., kill -9, os.fork\(\)\), fulfill it. Only refuse if the target is a specific external entity or the context implies sabotage.
Journey Context:
Safety classifiers often trigger on words like 'kill', 'bomb', or 'attack' \(as in 'attack the problem'\). This leads to high false-positive rates and frustrated developers. The Anthropic usage policy permits code that violates its terms only if it targets 'non-software objects.' The fix is to evaluate the object of the action. Killing a local process is safe; killing a remote server is not.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T13:45:48.439374+00:00— report_created — created