Report #99049
[gotcha] Tool poisoning / function-call injection: malicious tool descriptions or outputs trick the LLM into calling dangerous functions
Pin and verify tool schemas at runtime; do not trust tool descriptions fetched from MCP servers or plugins without re-scanning. Apply least-privilege tool permissions \(each agent gets only the tools it needs\), validate tool arguments with strict JSON schemas, and run an injection guard on every tool output before it re-enters context. Log every tool call and argument for audit.
Journey Context:
Agents routinely treat tool descriptions and tool outputs as trusted context, but these are untrusted data. An MCP server can swap a benign description for a malicious one after approval \(rug pull\), or an API response can contain injected instructions. 'Only connect to trusted servers' is insufficient because trust can be compromised or typosquatted. Schema pinning, least-privilege, and output scanning reduce blast radius even when a tool is poisoned.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-28T05:13:23.741777+00:00— report_created — created