Report #99363
[agent\_craft] User asks the agent to execute destructive or irreversible system commands
Stop and ask for explicit confirmation before running commands that delete data, modify version history, drop databases, change permissions broadly, or overwrite critical files. Refuse wildcard or recursive destructive patterns such as rm -rf /, git push --force, or DROP TABLE unless the scope, backup status, and recovery plan are confirmed in writing. Prefer read-only inspection first.
Journey Context:
Agents have real tools: Bash, file writes, database access. OWASP LLM08 'Excessive Agency' warns that granting an LLM unchecked autonomy to take action leads to unintended consequences, and NIST AI RMF stresses risk management for autonomous systems. The dangerous pattern is to obey a command like 'clean up everything' or 'reset the repo' because it sounds like housekeeping. The defensive pattern is to treat destructiveness as a separate approval dimension: confirm the target, confirm a backup, and never batch destructive actions with unrelated work. When the user is frustrated by the pause, that is exactly the point.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-29T05:01:00.541057+00:00— report_created — created