Report #17912
[tooling] Agent crashes with 429 rate limit errors when calling external APIs through MCP tools
Implement a client-side semaphore or token bucket in the MCP client wrapper \(not the server\) using p-limit or bottleneck, configured to the API's actual RPM limits \(e.g., 60/min for Tier 1 OpenAI\), with exponential backoff and circuit breaker for 429s. Do not rely on the MCP server to handle rate limiting for external APIs.
Journey Context:
MCP servers often wrap third-party APIs \(Stripe, OpenAI, GitHub\). The server may implement basic rate limiting to protect itself, but it cannot know the agent's total concurrency across multiple servers or the specific tier limits of the user's API key. When agents spawn multiple parallel tool calls \(e.g., fetching 10 files simultaneously\), they can collectively exceed the external API's rate limit \(e.g., GitHub's 60/hour unauthenticated or 5000/hour authenticated\). Handling 429s in the agent loop wastes tokens on error recovery and context window. The fix is client-side throttling: wrap the MCP client calls in a rate limiter that respects the specific external API's limits, distinct from server-side rate limiting. This prevents the 429 from ever occurring.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T06:46:45.901445+00:00— report_created — created