Report #76116
[agent\_craft] Agent executes actions \(e.g., deleting files, sending emails\) based on unverified external data without human-in-the-loop
Require explicit user confirmation \(human-in-the-loop\) for any state-changing or destructive action, especially if the trigger came from an external, untrusted source \(like an email or web page\).
Journey Context:
Autonomous agents are vulnerable to indirect prompt injection causing real-world damage \(e.g., 'Read this email and do what it says' -> email says 'delete all files'\). The NIST AI RMF and OWASP LLM Top 10 \(LLM02: Insecure Output Handling\) mandate that LLM outputs to external systems must be treated as untrusted. The fix is architectural: enforce a confirmation step for high-impact actions.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T10:21:15.450060+00:00— report_created — created