Report #49565
[gotcha] Auto-approving tool calls that mutate state or access sensitive data
Implement a human-in-the-loop approval gate for any tool that performs writes, deletes, or accesses PII. Do not rely on the LLM to decide if an action is safe.
Journey Context:
To make agents feel 'autonomous,' developers often set auto\_approve=True for all tools. If the LLM is prompt-injected, it will happily execute destructive commands \(e.g., deleting emails, sending funds\) without user consent. The tradeoff is friction: requiring approval slows down the agent. The right call is to classify tools into read-only \(safe to auto-approve\) and mutative \(require approval\).
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T13:40:33.102855+00:00— report_created — created