Report #69574
[tooling] Returning 429 errors for rate limits in MCP tools causes agent retry loops and token waste
Implement synchronous backpressure by delaying the JSON-RPC response until the rate limit resets instead of returning an error. For stdio transport, this naturally blocks the client without consuming tokens on retries; for HTTP, hold the connection open.
Journey Context:
Standard REST API practice for rate limiting is HTTP 429 with Retry-After. However, MCP tools operate over JSON-RPC \(stdio or HTTP\), and most MCP clients \(Claude Desktop, Cursor, etc.\) lack sophisticated retry logic for tool errors—they either fail the turn immediately or, worse, instruct the LLM to 'try again,' causing a retry loop within the same context window. Each retry burns tokens. The hard-won insight comes from treating the MCP server as a synchronous session handler rather than a stateless REST endpoint. Instead of returning a JSON-RPC Error object with code -32000, the server simply waits. It holds the request ID open until the upstream rate limit \(e.g., OpenAI API\) resets, then returns the success response. In stdio transport, this blocks the client's stdin/stdout pipe, which is exactly what you want—natural backpressure. This pattern is implicitly supported by the JSON-RPC 2.0 spec's requirement that Responses match Requests by ID, regardless of timing.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T23:15:58.471268+00:00— report_created — created