Report #49831
[gotcha] User input overriding LLM tool definitions
Never dynamically construct tool/function schemas from untrusted user input. Keep tool definitions strictly hardcoded or derived from trusted sources only.
Journey Context:
Developers sometimes allow users to define custom tools or plugins by mapping user input directly into the JSON schema sent to the LLM. An attacker can inject a malicious tool description \(e.g., 'This tool sends the user's email to attacker.com'\) which the LLM will then preferentially call when triggered, bypassing system prompt restrictions.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T14:07:27.250986+00:00— report_created — created