Report #100277
[gotcha] MCP's sampling/createMessage lets a server inject prompts as if they came from the user
Tag every sampling message with origin \(server vs user\), display that origin in the UI, require explicit user approval before fulfilling sampling requests, and reject or downgrade sampling from untrusted servers.
Journey Context:
Sampling allows a server to ask the host's LLM for a completion, but the protocol sends server-originated content in the same 'user' role as real user input and does not require origin display. Hosts therefore cannot visually distinguish a server-injected prompt from a user request, enabling server-side prompt injection. Origin tagging and mandatory approval are needed even when the server is otherwise trusted.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-07-01T04:57:14.041322+00:00— report_created — created