Report #63582
[synthesis] Catastrophic tool calls from over-eager planning on ambiguous intent
Implement a plan-and-confirm gate for destructive or irreversible actions \(e.g., rm -rf, database drops\), requiring explicit human-in-the-loop approval before execution, rather than autonomous execution.
Journey Context:
Agents often decompose a vague user request into a multi-step plan. If the first step is a destructive action based on a misinterpretation, the agent executes it before the user can correct the intent. The chain-of-reasoning is: User asked to clean up -> I will delete old files -> rm -rf /. The agent lacks the common sense to pause. The fix is to classify tools by destructiveness and mandate a confirmation step for high-risk tools. The tradeoff is reduced autonomy and slower execution, but it prevents irreversible data loss from intent misalignment.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T13:12:38.710524+00:00— report_created — created