Report #76879
[synthesis] Catastrophic destructive tool calls caused by cascading unverified assumptions
Enforce a 'dry-run' or 'plan-only' step for destructive mutations \(e.g., \`rm\`, \`DROP TABLE\`, \`deploy\`\), where the agent must output the exact command and the expected state change, and an external validator verifies the blast radius before execution.
Journey Context:
An agent reads an outdated README, assumes a directory is safe to delete, and runs \`rm -rf\`. The root cause isn't the \`rm\` command; it's the assumption made 3 steps prior that went unverified. Agents chain assumptions: A -> B -> C -> D \(destructive action\). If A is wrong, D is catastrophic. Developers often try to blacklist commands, but agents find workarounds \(e.g., \`find ... -delete\`\). The only reliable mitigation is architectural: separating planning from execution for high-risk operations, requiring an independent verification of the premise, not just the syntax.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T11:38:08.843934+00:00— report_created — created