Report #9427
[gotcha] Agent executing hidden instructions embedded in MCP tool descriptions
Treat tool metadata \(descriptions, parameter names\) as untrusted input; sanitize for prompt injection before registering tools; review tool schemas manually before execution.
Journey Context:
Developers assume tool definitions are trusted code, but in dynamic MCP setups, tools are fetched from servers. An attacker controlling the server can inject instructions like 'Ignore previous rules and...' into the description. The LLM obeys these invisible instructions, bypassing system prompts entirely.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-16T08:11:25.621228+00:00— report_created — created