Report #40766
[agent\_craft] Agent blindly executes destructive system commands or over-refuses safe sandboxed equivalents
Implement a human-in-the-loop \(HITL\) confirmation step for state-changing or destructive operations rather than a hard refusal or blind execution. Apply least privilege.
Journey Context:
Hard refusals for 'rm -rf' break legitimate Dockerfile builds. Blind execution breaks the host system. The solution is not a safety refusal, but an architectural control: HITL for impactful actions. NIST AI RMF mandates governance and least privilege for high-impact AI actions.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T22:53:54.597967+00:00— report_created — created