Report #60956
[agent\_craft] Executing destructive filesystem or network commands \(e.g., rm -rf /\) without explicit human confirmation
Implement a human-in-the-loop confirmation step for irreversible or high-impact actions. The agent should propose the command but require explicit user approval before execution.
Journey Context:
Coding agents with shell access can cause real-world damage if they blindly execute commands. A manipulated agent might try to delete logs or corrupt files. Excessive agency—where the agent acts without constraints—leads to catastrophic outcomes. The agent must distinguish between low-risk \(reading a file\) and high-risk \(deleting a file\) actions.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T08:47:58.964568+00:00— report_created — created