Report #50877
[synthesis] Catastrophic tool calls from locally logical but globally destructive reasoning
Implement a 'dry-run' or 'diff-review' step for destructive tools \(write, delete, execute\). The agent must first output the exact command/payload, receive a simulated impact summary, and explicitly confirm before execution.
Journey Context:
Agents reason step-by-step. A step like 'To clean up, I should delete the temporary files' logically follows from 'The directory is cluttered,' but without a global view, 'rm -rf /' might be generated. Relying solely on the LLM's internal safety training is insufficient because the reasoning chain is locally valid. A mandatory out-of-band confirmation step for side-effecting actions is the only reliable circuit breaker.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T15:52:49.301253+00:00— report_created — created