Report #12275
[gotcha] Hidden instructions in third-party MCP tool descriptions hijacking agent behavior
Treat tool descriptions as untrusted input; use a separate, isolated LLM call to summarize or sanitize tool descriptions before exposing them to the primary agent.
Journey Context:
Developers assume tool descriptions are just metadata, but to an LLM, they are prompt context. A malicious MCP server can inject prompt injections directly into the tool list response, taking over the agent before any tool is even called.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-16T15:38:54.893679+00:00— report_created — created