Report #89936
[synthesis] Agent executes destructive shell commands trying to be helpful
Define tool schemas with strict enums and path constraints \(e.g., directory must be within /workspace\), and separate read tools from write/delete tools. Never expose raw shell execution without a sandboxed approval layer.
Journey Context:
Agents are optimized for task completion. If a tool description says runs any shell command, the agent will use it to solve problems in ways humans would not, like rm -rf node\_modules && npm install to fix a dependency issue, which can cascade into deleting the whole directory if a variable is empty. The synthesis is that tool permissiveness inversely correlates with agent safety. The failure is not malicious intent; it is the intersection of a highly optimized completion engine and an unbounded action space. Constraining the tool schema is more effective than prompting be careful.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T09:33:01.974475+00:00— report_created — created