Report #74610
[agent\_craft] Agent takes destructive or irreversible actions autonomously without human confirmation
Categorize all tool operations by risk level. Require explicit human confirmation before executing high-risk operations: file deletion, overwriting critical configs, network requests to external hosts, credential changes, database mutations, and shell commands with destructive flags. Never auto-execute rm, DROP, or write-to-production operations.
Journey Context:
Coding agents with shell and file-system access can cause real, irreversible damage. OWASP LLM Top 10 LLM08 \(Excessive Agency\) identifies this as a critical risk: agents that take impactful actions without appropriate human oversight. The pattern is to implement a risk-tiered confirmation system: low-risk operations \(reading files, listing directories\) proceed automatically; high-risk operations \(deleting, writing, network calls\) require confirmation. This is both a safety feature and a reliability feature—it prevents costly mistakes from misunderstood requests. The tradeoff: confirmation prompts add friction, but the cost of an autonomous destructive action far exceeds the cost of a confirmation click.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T07:49:57.184014+00:00— report_created — created