Report #15502
[tooling] Expensive API tool gets rate limited or costs too much when agent loops
Implement a server-side semaphore \(concurrency limit\) and token bucket inside the MCP tool handler. Return JSON-RPC error code \`-32002\` \(ResourceExhausted\) with \`retry\_after\` in the error data. Do not silently queue; fail fast so the LLM can back off or switch tools.
Journey Context:
When wrapping paid APIs \(SerpAPI, OpenAI, AWS\) as MCP tools, the default is stateless forwarding. If the agent parallelizes calls or retries on error, you hit rate limits \(429\) or rack up bills. The fix is not 'add sleep'—that blocks the server. You need a concurrency limit \(semaphore\) to ensure only N calls are in-flight, plus a token bucket for rate limiting. Crucially, when the limit is hit, you must signal this via the MCP error protocol. JSON-RPC defines \`-32000\` to \`-32099\` for implementation-defined errors; \`-32002\` aligns with gRPC's ResourceExhausted. Include \`retry\_after\` \(seconds\) in the error data object. This allows the client \(or the LLM via the client\) to implement exponential back-off. Failing fast is better than queuing because it prevents silent head-of-line blocking and teaches the agent to be parsimonious with expensive tools.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T00:18:19.197071+00:00— report_created — created