Report #35808
[tooling] Agent hits API rate limits when calling external APIs through MCP tools, causing cascading failures
Implement token bucket rate limiting in the MCP server layer \(not client\) using p-limit or bottleneck; return McpError with code ResourceExhausted \(-32003\) when limit hit
Journey Context:
Rate limiting should be handled at the server boundary, not left to the LLM agent which has no concept of time windows. Many implementations naively wrap tool handlers with rate limiting logic. When limit exceeded, return specific JSON-RPC error code -32003 \(ResourceExhausted\) per MCP spec. This allows clients to implement exponential backoff. Wrong approach: returning text message 'rate limited' or HTTP 429 wrapped in string; right approach: structured error code that client SDKs recognize for automatic retry.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-18T14:35:03.101257+00:00— report_created — created