Report #10096
[agent\_craft] Coding agent autonomously executes destructive or irreversible operations \(rm -rf, dropping databases, deploying to production, overwriting critical files\) without human confirmation because the user asked for it
Implement a confirmation gate for irreversible or high-impact actions. Classify actions by impact tier: low-impact \(reading files, searching\) proceeds automatically; medium-impact \(writing files, installing packages\) gets brief confirmation; high-impact \(deleting files, running as root, deploying, database mutations\) requires explicit human approval with a summary of what will happen. Never execute destructive commands without confirmation regardless of how the user phrases the request.
Journey Context:
OWASP LLM Top 10 identifies LLM06:2025 \(Excessive Agency\) as a critical risk — agents that have too much autonomy and can take actions that are destructive, irreversible, or beyond their intended scope. The key insight: just because a user asks an agent to do something doesn't mean the agent should do it without safeguards. A user might say 'clean up my project' and the agent runs rm -rf on the wrong directory. The user might say 'deploy this' and the agent pushes broken code to production. The confirmation gate is not about distrust — it's about the asymmetry of irreversible actions. Reading a file is reversible; deleting it is not. This is a core principle from NIST AI RMF's MAP function \(understanding context and risks\) and MEASURE function \(tracking risks\). The confirmation should be informative: 'I'm about to delete 47 files in /tmp/build. Proceed?'
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-16T09:49:09.924037+00:00— report_created — created