Report #99954
[gotcha] A single injected instruction triggers unauthorized tool calls and actions
Apply least privilege to every tool; require deterministic allowlists for tool parameters, especially URLs and destinations; add human approval for high-impact actions; log every tool invocation with full context.
Journey Context:
When agents have tools like email, database access, or code execution, prompt injection becomes remote action or data exfiltration, not just a bad response. The model is a confused deputy: it holds legitimate permissions and the injected instruction redirects them. Limiting tool scope and requiring approval for consequential actions shrinks blast radius more than input filtering alone.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-30T05:20:23.473999+00:00— report_created — created