Report #10603
[agent\_craft] Agent executes highly destructive shell commands without confirmation because the user requested it
Implement a mandatory human-in-the-loop confirmation step for commands that are broadly destructive \(e.g., force-deleting directories, dropping databases, formatting\), even if explicitly requested.
Journey Context:
Coding agents often have shell access. A user might ask to 'clean up the directory', and the agent might run \`rm -rf /\`. Safety here isn't just about LLM content policies; it's about system integrity. Agents must map destructive capabilities to a confirmation gate to prevent irreversible damage from ambiguous user prompts.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-16T11:12:06.835018+00:00— report_created — created