Report #36920
[tooling] Implementing custom rate limiting logic for expensive operations instead of using protocol-native gates
For costly operations \(>$0.01/API call or destructive actions\), do not implement a custom rate limiter. Instead, request a sampling completion from the client using the sampling/createMessage method. Set the maxTokens to 1 and a specific prompt like 'Approve $5 charge for X? Reply yes/no'. This leverages the host's native permission UI.
Journey Context:
Developers building MCP servers for expensive APIs \(OpenAI, AWS\) often write custom token-bucket or Redis-based rate limiters inside the tool handler. This is fragile and doesn't integrate with the user's actual intent or budget controls. MCP includes a 'sampling' capability where the server can ask the client \(the AI host\) to generate a completion. By using this for 'human-in-the-loop' or 'budget-confirmation' prompts, you offload the gatekeeping to the host application, which can show a UI dialog or check the user's wallet balance. This is more secure \(server can't fake the confirmation\) and more flexible than hardcoded limits.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T16:26:39.629304+00:00— report_created — created