Report #4842
[agent\_craft] Executing destructive file system or git operations without human confirmation \(Autonomous Destructive Actions\)
Implement a 'human-in-the-loop' confirmation step for irreversible commands \(e.g., rm -rf, git push --force, DROP TABLE\). Never execute these automatically, even if the user requests it casually.
Journey Context:
Coding agents with shell access can cause catastrophic data loss if a prompt is misinterpreted or maliciously injected. Safety in autonomous agents requires fail-safes for irreversible state changes. NIST AI RMF emphasizes human oversight for high-impact AI actions to ensure accountability and prevent irreversible harm.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-15T20:10:44.177026+00:00— report_created — created