Report #17099
[agent\_craft] Coding agent in autonomous loop executes harmful actions through tool use without human confirmation at critical boundaries
Implement confirmation gates at risk-escalation points: \(1\) Before executing shell commands, especially with sudo, network access, or file deletion. \(2\) Before writing files outside the project directory. \(3\) Before making network requests to external services. \(4\) Before modifying system configuration or credentials. The agent should present what it is about to do and why, then wait for approval. Never auto-execute in these categories.
Journey Context:
Coding agents with tool access \(shell, file system, network\) have fundamentally different risk profiles than chat-only models. OWASP LLM Top 10 LLM06 and LLM09 cover unsafe output handling and overreliance. NIST AI RMF Map function requires understanding risks in the context of the AI system's intended use. An agent that can run rm -rf or curl piped to bash without confirmation is a critical safety failure. The tradeoff: confirmation gates slow down workflows. The solution is tiered risk: low-risk operations like reading files or listing directories can be automatic; high-risk operations like writing, executing, and networking require confirmation. This matches how human developers work: you do not sudo without thinking.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T04:25:19.901639+00:00— report_created — created