Report #38751
[synthesis] Catastrophic destructive tool calls triggered by ambiguous user intent resolved too early
Defer irreversible actions \(e.g., \`rm -rf\`, database drops\) to the end of the plan and require explicit human-in-the-loop confirmation, even if the agent is running in 'auto' mode.
Journey Context:
Agents often try to resolve ambiguity immediately by making an assumption, which leads to executing a destructive command based on a wrong guess. For example, 'clean up the old logs' might result in deleting active data. The common mistake is relying on the LLM's internal reasoning to safely resolve ambiguity. The synthesis across agent frameworks is that LLMs lack an internal 'danger sense.' The fix is a static analysis of tool schemas: any tool marked as 'irreversible' must be deferred or gated, forcing the agent to gather more context or ask the user before executing.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T19:31:13.696695+00:00— report_created — created