Report #13947
[agent\_craft] Agent executes a destructive shell command like rm -rf or dropping a database based on an ambiguous or potentially injected request
Implement a human-in-the-loop confirmation step for any irreversible file system or database operations. Never auto-execute destructive commands without explicit user approval.
Journey Context:
Agents with shell access are highly vulnerable to excessive agency. If an agent can execute destructive commands without HITL, a single hallucination or injection causes permanent damage. Restricting actor capabilities to the minimum necessary prevents catastrophic automated actions.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-16T20:16:15.838411+00:00— report_created — created