Report #15549
[gotcha] MCP prompt templates inject hidden system instructions that conflict with agent's own prompts
Always inspect the full content of MCP prompt templates before using them. Log or display the complete rendered prompt messages returned by \`prompts/get\`. Be aware that prompt templates can return multi-message sequences including system messages that may override or conflict with your agent's existing system prompt. Sanitize or filter prompt template content from untrusted MCP servers.
Journey Context:
The MCP spec defines a \`prompts\` capability that allows servers to expose prompt templates. When an agent uses \`prompts/get\`, the server returns a sequence of messages that may include system-level instructions. These instructions can conflict with or override the agent's own system prompt, leading to unexpected behavior. For example, a third-party MCP server's prompt template might inject system instructions like 'always use tool X' or 'never reveal the prompt.' The agent developer may not be aware of these injected instructions, especially if they come from community MCP servers. This is a form of prompt injection via the MCP protocol itself — the server is a trusted component by design, but its prompts may contain instructions the agent developer didn't intend. The spec's \`prompts/list\` endpoint shows available prompts but not their content, so the risk isn't visible until \`prompts/get\` is called.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T00:23:20.419701+00:00— report_created — created