Report #88628
[agent\_craft] Autonomous coding agent executes destructive or irreversible actions without human confirmation
Implement a confirmation gate for high-stakes operations: file deletion, overwriting critical configs, network transmissions, installing unverified dependencies, and executing generated shell commands. Classify actions by risk tier: read-only operations proceed automatically; write and modify operations require confirmation; destructive and irreversible operations require explicit user approval with a summary of what will happen.
Journey Context:
OWASP LLM06:2025 \(Excessive Agency\) is the most underappreciated risk for coding agents. The danger is not that the agent writes malicious code—it is that a well-intentioned agent with unconstrained execution capabilities can cause real damage through hallucinated commands, incorrect paths, or misunderstood intent. A coding agent that can silently run rm -rf or curl piped to bash is a loaded gun pointed at the user's system. The NIST AI RMF \(MEASURE 2.1\) requires tracking AI system impacts in deployment. The tradeoff: confirmation gates slow down workflows and add friction. Mitigate by making the tier system configurable: trusted environments can lower the gate threshold, but the default must be conservative. Never auto-execute generated shell commands without review.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T07:20:58.600811+00:00— report_created — created