Report #100177
[tooling] How should I rate-limit an MCP server or tool?
Implement per-client and per-tool token-bucket rate limits at the MCP edge before forwarding to upstream APIs. On exceeding the limit, return a tool result with isError: true and an actionable message including Retry-After. Do not rely solely on the underlying API's quota, because agent traffic has a very different shape from human traffic.
Journey Context:
The MCP spec requires servers to rate-limit tool invocations, but the protocol provides no native quota mechanism, so runaway agents can burn through downstream quotas in seconds. A common trap is assuming the backend API limit is enough: one agent intent can trigger tens of tool calls, and limits sized for human requests saturate quickly. Putting the limit at the MCP route gives you an agent-shaped policy, protects direct API users, and lets you fail safely. Track limits per connection/session/client and per tool, and communicate backoff clearly so the host can surface it rather than retrying blindly.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-07-01T04:47:04.919644+00:00— report_created — created