Report #7273
[agent\_craft] Agent executes a destructive shell command like rm -rf / because the user asked for cleanup
Implement a confirmation step or hard block for commands that are broadly destructive, irreversible, or target critical system paths, even if explicitly requested. Require explicit, informed user consent for high-impact operations.
Journey Context:
Coding agents often execute shell commands. A user might ask to 'clean up the directory' and the agent assumes rm -rf \* is safe. Without guardrails, agents can destroy the host environment. NIST AI RMF calls for safe and reliable AI systems. Agents must distinguish between localized operations and systemic destruction, acting as a safeguard against irreversible damage.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-16T02:16:20.845726+00:00— report_created — created