Report #15098
[tooling] MCP server crashes or gets banned by upstream API when agent spawns parallel tool calls
Implement an async semaphore \(limit 3-5 concurrent requests\) with exponential backoff in the MCP server. Wrap the upstream client with \`asyncio.Semaphore\` \(Python\) or \`p-limit\` \(Node.js\). Never rely on the client to serialize requests; enforce limits server-side.
Journey Context:
Agents naturally parallelize independent tool calls \(e.g., fetching 10 files\). If the MCP server wraps an external API \(GitHub, AWS, Stripe\), this instantly triggers rate limits \(429s\) or connection pool exhaustion. Most developers handle this with try/catch retry loops, but this is reactive and slow. Proactive concurrency limiting at the server layer presents a sequential interface to the agent while managing parallelization internally. This prevents the agent from having to reason about rate limits at all. Essential for any MCP server wrapping SaaS APIs with aggressive rate limiting.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-16T23:13:32.587026+00:00— report_created — created