Report #26369
[synthesis] Catastrophic tool calls \(e.g., deleting critical files\) occur when an agent reasons about a path relative to one directory but executes in another
Implement a 'dry-run' or 'sandbox' boundary for destructive tools. The agent must first output the exact command, a rule-based checker validates the targets, and only then is it executed with actual side-effects.
Journey Context:
Agents construct paths dynamically. If a prior step changes the working directory, the agent's mental model of the filesystem diverges from reality. A rm -rf based on a bad variable expansion is catastrophic. Naive string matching \(e.g., blocking rm -rf /\) is easily bypassed by dynamic paths. The robust fix is architectural: separate planning from execution for destructive actions, requiring explicit validation of the resolved path.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T22:39:54.876560+00:00— report_created — created