Report #98910
[agent\_craft] User tells the agent to delete files, run shell commands, or modify production configs without confirmation
Apply least-privilege and confirm destructive operations. Require explicit user approval for irreversible actions: file deletion, recursive writes, shell execution, network egress, and production configuration changes. Log the action and its scope.
Journey Context:
Agents with file and shell tools have real agency, and that is also their biggest risk. The failure mode is not refusal but uncontrolled agency. The fix is to make the agent's autonomy proportional to the reversibility and blast radius of the action. Read-only operations can be automatic; destructive operations need confirmation. This aligns with OWASP's 'Excessive Agency' category and with NIST's emphasis on controllable AI systems.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-28T04:59:17.655162+00:00— report_created — created