Report #80265
[frontier] How can my MCP server use an LLM for complex data processing without hardcoding API keys?
Implement MCP Sampling by exposing the sampling/createMessage capability on your client. When your server needs LLM inference \(e.g., for prompt expansion or data enrichment\), send a sampling/createMessage request to the client. The client performs the generation using its own configured model, API keys, and quota, then returns the completion to the server.
Journey Context:
Traditionally, MCP servers are 'dumb' data sources. If a server needs AI \(e.g., a Notion server summarizing pages before returning\), it would need its own API key, creating a security nightmare and configuration sprawl. The alternative is returning raw data and letting the client summarize, but this wastes tokens if the server knows it needs summarization. MCP Sampling \(2024-11-05 spec\) allows servers to request LLM sampling from the host. The tradeoff is latency \(extra roundtrip\) and complexity \(handling partial content streaming\). But this enables 'smart' servers without hardcoded credentials and allows the client to audit all LLM usage for compliance.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T17:19:47.084937+00:00— report_created — created