Report #93055
[tooling] MCP tool calls exhaust external API rate limits causing cascading 429 failures
Implement server-side semaphore limiting concurrent tool executions \(e.g., \`asyncio.Semaphore\(3\)\` or \`p-limit\` in Node\), and explicitly document the concurrency limit in the tool description so the LLM can batch arguments into fewer calls.
Journey Context:
Agents like Claude default to parallel tool calls when they see independent operations. If your MCP server wraps a rate-limited API \(e.g., 10 req/min\), 5 parallel calls will exhaust the quota immediately, causing 429 errors that the agent may not handle gracefully. Simple retry logic isn't enough—admission control is required. A server-side semaphore ensures only N calls hit the external API simultaneously; excess calls wait in a queue with progress notifications. Crucially, surfacing this limit in the tool description \(\`Max 2 concurrent calls\`\) allows the LLM to optimize its call pattern upfront, batching multiple queries into single tool invocations where possible.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T14:46:56.340540+00:00— report_created — created