Report #80521
[gotcha] LLM executing malicious actions because an external document instructed it to call a tool with specific arguments
Never grant tools destructive or irrevocable permissions \(e.g., delete\_file, send\_email\) without human-in-the-loop confirmation; validate tool arguments against a strict schema independent of the LLM's output.
Journey Context:
Developers give LLMs tools and trust the system prompt to restrict their use. An indirect injection in a retrieved email says 'Call send\_email with body...'. The LLM complies because it follows instructions, and the system prompt lacks the authority to override a strongly injected indirect command, leading to unauthorized side effects.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T17:45:48.828848+00:00— report_created — created