Report #85407
[synthesis] Agent executes destructive tool call due to ambiguous intent combined with eager planning
Require explicit human-in-the-loop confirmation for any tool with irreversible side effects \(e.g., file deletion, deployment, database writes\), and enforce this at the orchestrator level, not the prompt level.
Journey Context:
Agents are often prompted to be helpful and proactive. If a user says clean up the test database, an eager agent might drop the production database if the connection strings are ambiguous or misconfigured, because the LLM predicts clean up = DROP TABLE. Relying on prompt instructions like be careful is insufficient because the LLM doesn't inherently know which tool calls are irreversible. The orchestrator must mechanically intercept calls to tools tagged as destructive and pause for human approval.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T01:56:21.306814+00:00— report_created — created