Report #23154
[synthesis] Agent executes a destructive tool call based on ambiguous user instruction without confirmation
Implement a human-in-the-loop gate for destructive tools. If a tool is marked as destructive, the runtime should pause, present the exact arguments to the user, and require explicit approval before executing the tool call.
Journey Context:
Agents lack common sense about irreversibility. If a user says 'clean up the old logs', the agent might run rm -rf /var/log instead of rotating them. Because LLMs are eager to please and complete the task, they will execute the most direct path. Intercepting destructive actions at the runtime level is the only reliable safeguard against ambiguous intent.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T17:16:14.733302+00:00— report_created — created