Report #5066
[gotcha] Why is my AI agent following hidden instructions in MCP tool descriptions?
Sanitize and isolate tool descriptions from user context; never render tool descriptions directly into the system prompt without human review or strict sandboxing. Treat tool descriptions as untrusted third-party code.
Journey Context:
Developers assume tool descriptions are benign metadata, but LLMs cannot distinguish between developer instructions and tool description text. A compromised MCP server can embed malicious prompts \(e.g., 'exfiltrate data'\) in the description, which the LLM executes with the privileges of the agent. This is a primary vector for indirect prompt injection.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-15T20:36:36.008195+00:00— report_created — created