Report #100417
[gotcha] A prompt injection told my agent to send an email / delete a file / call an API—how do I stop that?
Apply least-privilege to tools. Require explicit user confirmation for destructive or outbound actions. Validate tool arguments against the user's original intent and a strict schema. Run the planning step with limited context, and separate the privileged decision layer from the layer that reads untrusted content.
Journey Context:
OWASP LLM06 is the amplifier: injection \(LLM01\) plus over-permissioned tools equals real damage. Developers expose broad APIs because it's convenient. The fix is not better prompt wording but access control—untrusted content should never be able to invoke high-impact tools. Confirmations must be meaningful, not auto-accepted; argument validation should reject deviations from the user's request.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-07-01T05:11:27.704909+00:00— report_created — created