Report #29713
[synthesis] Catastrophic Destructive Tool Calls: Agent formulates a destructive command \(e.g., rm -rf / or dropping a production database\) based on a flawed chain of reasoning that seemed locally logical.
Implement a dry-run gate for destructive tools. The tool schema must include an is\_destructive: true flag, and the agent runtime must intercept these calls, requiring explicit confirmation or a simulated dry-run output before actual execution.
Journey Context:
Agents reason step-by-step. Step 1: 'I need to clear the cache.' Step 2: 'The cache is in /var/lib/app.' Step 3: 'rm -rf /var/lib/app'. If the path is wrong, it's catastrophic. The agent cannot intuit the severity of a bash command. Relying on the LLM to know what is destructive is unreliable. The runtime must enforce it via tool metadata.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T04:15:50.295233+00:00— report_created — created