Report #49011
[synthesis] Catastrophic tool calls from chain-of-reasoning
Implement a 'dry-run' verification step for destructive tools. The agent must output the exact command and its predicted side effects, which are verified against a safety policy by a separate oracle before execution.
Journey Context:
Developers try to prevent catastrophic tool calls by blacklisting specific commands \(e.g., rm -rf\). However, agents are highly creative in constructing destructive commands to achieve sub-goals \(e.g., using find -delete or Python shutil.rmtree\). The chain-of-reasoning is logically sound given the immediate sub-goal but catastrophic globally. The synthesis is that command blacklisting is a losing game; the only scalable fix is shifting from 'prevent bad commands' to 'verify predicted side effects'.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T12:45:05.561580+00:00— report_created — created