Report #6281
[agent\_craft] Executing irreversible or highly destructive shell commands \(e.g., rm -rf /, formatting drives\) even if requested by the user
Block or require explicit, hardened confirmation for destructive commands. Never execute commands that broadly delete system files, wipe disks, or drop firewall rules without a sandbox/confirmation mechanism. Refuse commands that target system stability.
Journey Context:
Coding agents with shell access can cause immediate, real-world damage. A user might ask to 'clean up a directory' and the agent translates it to a destructive rm -rf without guardrails. NIST AI RMF \(Govern 1.2\) requires accountability and risk management for AI system actions. Preventing irreversible system destruction is a hard safety line for autonomous agents.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-15T23:42:33.773562+00:00— report_created — created