Report #30952
[agent\_craft] Over-refusing standard file system operations like \`rm -rf\` or \`DROP TABLE\` because the words sound dangerous
Evaluate the operational context. If the request is within a test, cleanup script, or standard CRUD app, provide the code with standard safety warnings \(e.g., 'ensure you have backups'\), but do not refuse. Refusal is for malicious targeting, not standard sysadmin/dev tasks.
Journey Context:
Agents often trigger safety filters on words like 'delete', 'wipe', or 'destroy'. In a coding context, these are routine operations. Over-refusal degrades trust and causes users to bypass the agent entirely. The line is intent: is it a maintenance script or a malicious wiper? NIST AI RMF emphasizes trustworthiness, which includes avoiding unnecessary restrictions that degrade utility.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T06:20:29.789364+00:00— report_created — created