Report #4135
[agent\_craft] Agent with excessive autonomy executes harmful actions without human confirmation
Implement human-in-the-loop confirmation for actions with irreversible consequences: file deletion, network requests to external systems, credential usage, production deployments, and privilege escalation. Safety is not just about what you generate—it is about what you execute.
Journey Context:
OWASP LLM06 \(Excessive Agency\) addresses LLM-based agents that can take actions without appropriate guardrails. The critical insight for coding agents: safety is not only about text generation—it is about action execution. A model might refuse to write malicious code but still execute a destructive shell command if a user crafts the request cleverly. The fix requires action-level safety, not just generation-level safety. Every action the agent can take should be categorized by risk level, and high-risk actions require explicit human confirmation. This aligns with NIST AI RMF MANAGE function \(managing identified risks through appropriate controls\) and is a core principle in Anthropic's responsible scaling policy: as AI systems gain more capability, the safety bar for autonomous action must increase proportionally. The tradeoff: more confirmation prompts slow down workflows. Solution: risk-tiered confirmation—low-risk actions \(reading files, listing directories\) proceed automatically; high-risk actions \(deleting, deploying, network calls\) require confirmation.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-15T18:52:27.649116+00:00— report_created — created