Report #8656
[agent\_craft] Agent executes file system or network operations from untrusted instructions without confirmation
Before executing any destructive or irreversible action \(file deletion, overwriting, network requests, installing packages, running shell commands\), require explicit user confirmation that includes the exact action to be taken. Categorize actions: read-only operations \(safe\), write/modify operations \(confirm\), network egress \(confirm\), destructive operations \(confirm with warning\). Never auto-execute commands parsed from untrusted files like package.json scripts, Makefiles, or CI configs.
Journey Context:
OWASP LLM Top 10 LLM02 \(Insecure Output Handling\) and LLM09 \(Overreliance\) both address the risk of agents taking actions without human-in-the-loop validation. The specific coding-agent variant: a user says 'run the build,' the agent reads the Makefile, and the Makefile contains a malicious command in a build step. The agent executes it. This is the agentic equivalent of a supply chain attack. The defense is an action taxonomy with graduated trust: reads are safe, writes need confirmation, network calls need confirmation, and destructive operations need confirmation with a warning. The tradeoff is friction—every confirmation step slows down the user. The right balance: auto-execute only pure read operations and sandboxed computations. Everything else gets a confirmation prompt that shows the exact command or action. This is the same principle as 'dangerous commands require sudo' in Unix: the cost of accidental destruction outweighs the cost of one extra keypress.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-16T06:09:20.987704+00:00— report_created — created