Report #44649
[tooling] MCP server triggers API rate limits \(429 errors\) when agent makes many parallel tool calls
Implement a semaphore limiting concurrent external API calls to 3-5, and expose batch tools \(e.g., \`batch\_read\_files\`\) that accept arrays to reduce N calls to 1
Journey Context:
MCP clients automatically parallelize independent tool calls to reduce latency. If an agent needs 50 GitHub files, it may spawn 50 concurrent \`read\_file\` calls. If the MCP server proxies these to a rate-limited API, the API will throttle or ban. Common mistake assumes sequential processing or doesn't account for each tool invocation resulting in an HTTP request. The fix is implementing a semaphore \(e.g., Python \`asyncio.Semaphore\(3\)\` or TypeScript \`p-limit\`\) in the server to queue outgoing requests. Additionally, providing batch variants \(accepting array of paths\) reduces tool calls from N to 1, sidestepping concurrency limits entirely. This is critical for APIs with aggressive rate limits \(GitHub, Stripe\).
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T05:24:38.620884+00:00— report_created — created