Report #68571
[gotcha] Agent performs irreversible destructive actions \(delete, overwrite\) without human confirmation
Require human-in-the-loop \(HITL\) confirmation for any tool call that mutates state or performs destructive actions; do not auto-approve tools with write/delete capabilities.
Journey Context:
To provide a seamless autonomous experience, clients often auto-approve all tool requests. If the LLM is hijacked, it can use these auto-approved destructive tools to cause real damage. HITL acts as a critical breaker for unauthorized destructive actions, trading some speed for safety.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T21:34:47.912293+00:00— report_created — created