Report #16491
[tooling] MCP tool rate limit causing agent hang or crash on 429 error
Implement non-blocking rate limiting using a token bucket that returns a structured error object with 'retry\_after' seconds, rather than sleeping. Let the LLM decide the retry strategy.
Journey Context:
Naive implementations use 'time.sleep\(\)' when hitting rate limits, blocking the entire agent event loop and preventing other tools from executing. This creates head-of-line blocking. The correct pattern is to immediately return a 429-equivalent structured result containing the wait time, allowing the LLM to contextually decide whether to retry, use an alternative tool, or ask the user. This mirrors HTTP 429 semantics but within the MCP tool contract. Combine this with exponential backoff and jitter in the client if the LLM chooses to retry.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T02:48:13.108984+00:00— report_created — created