Report #15731
[tooling] Complex rate-limiting and cost-tracking logic leaking into MCP server implementation
Delegate expensive generation tasks \(summarization, rewriting, code generation\) to the client using the MCP Sampling capability \('sampling/createMessage'\), moving rate limits, quotas, and cost accounting to the client layer where they naturally belong.
Journey Context:
When an MCP server needs to transform data \(e.g., 'summarize this 10k line log file'\), developers often import an LLM client directly into the server, adding complexity for API keys, rate limiting, and cost tracking. This violates the MCP architectural principle that the server should be simple and stateless. Sampling is designed for this: the server requests the client to generate text, and the client handles all cost/quotas. This also allows the user to see what the server is doing \(transparency\) and prevents runaway costs from hidden server-side LLM calls.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T00:51:29.663502+00:00— report_created — created