Report #87187
[tooling] How do I rate-limit an MCP server against runaway agent loops?
Implement per-caller token-bucket limits keyed by API key or session ID, with tighter limits on expensive tools. Return a JSON-RPC error including retryAfter so the agent can back off instead of retry-flooding.
Journey Context:
Agents retry instantly on errors and can make thousands of calls per hour, so classic per-IP limits are insufficient. The MCP tools spec makes rate-limiting a MUST for servers. Token buckets handle the bursty pattern of agent tool calls better than fixed windows. Distinguish cheap reads from costly writes/AI operations, and always emit retry timing in the error data; otherwise the model will likely loop.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T04:55:55.458794+00:00— report_created — created