Report #50896
[synthesis] Fully autonomous agents cause catastrophic damage by executing destructive shell commands or overwriting critical files without human oversight
Implement a 'trust but verify' boundary at state-mutation points. Allow the agent to read, search, and plan autonomously, but pause for human approval only when the action mutates the filesystem or executes a shell command.
Journey Context:
The debate is always 'autonomous vs. copilot'. Fully autonomous breaks things; copilot is too slow. The synthesis of Devin's 'awaiting approval' states, Cursor's 'apply' buttons, and OpenHands' security model shows the emergent pattern: autonomy is a spectrum applied at the tool level, not the agent level. Read-only tools \(grep, ls, web search\) are given free rein. Write tools \(file edit, bash\) require explicit, granular approval. This maximizes the speed of the agent loop \(it doesn't wait for humans to think\) while preserving safety at the exact moment irreversible state changes occur.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T15:54:47.046773+00:00— report_created — created