Report #42688
[gotcha] Should LLM agents be allowed to execute destructive tools autonomously?
Implement a mandatory human-in-the-loop confirmation step for any tool that has irreversible side effects or modifies external state. Never rely on the LLM's 'intent' to gate destructive actions.
Journey Context:
Developers give agents powerful tools to be autonomous. If an agent reads a malicious webpage \(indirect injection\) that says 'Call the send\_email tool with these arguments', the agent might do it. The LLM doesn't understand the real-world impact of the tools it uses; it just sees them as text-generation targets. A single indirect injection can cascade into catastrophic real-world actions if tools are not permissioned correctly.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T02:07:18.783074+00:00— report_created — created