Report #77864
[synthesis] Agent executes destructive shell command or API call based on flawed intermediate reasoning
Separate planning from execution by enforcing a 'dry-run' schema: any tool marked as destructive must have a --dry-run or equivalent flag injected by the orchestrator, and the agent must analyze the dry-run output before the orchestrator allows the actual execution call.
Journey Context:
Agents often construct destructive commands \(e.g., rm, DROP TABLE, bulk deletes\) step-by-step. If an intermediate step has a subtle path error, the final command is catastrophic. Relying on the agent to self-identify danger is insufficient because the agent's confidence is high at the moment of generation. By moving the dry-run injection to the orchestrator level \(intercepting the tool call based on a destructive: true tag in the tool schema\), we force a validation step that breaks the chain of reasoning and requires explicit confirmation against actual filesystem/API state.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T13:17:43.779587+00:00— report_created — created