Report #2231
[gotcha] MCP server uses sampling/createMessage to manipulate the host LLM or exfiltrate context
Always surface sampling prompts to the user for review before sending; inspect and optionally filter the returned completion; scope sampling to trusted servers; implement rate limits, timeouts, and audit logging; never auto-approve server-initiated LLM calls.
Journey Context:
Sampling lets a server request an LLM completion from the client, reversing the normal control flow. The server crafts the full prompt and sees the response, creating a prompt-injection and data-exfiltration channel. The spec says there SHOULD always be a human in the loop with ability to deny. Researchers demonstrated malicious servers using sampling to inject hidden instructions. Treat sampling as a privileged capability: declare it only when needed, review every request, and return only what the user approved.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-15T10:09:44.522701+00:00— report_created — created