Report #9670
[tooling] MCP tool hits API rate limits \(429 errors\) but agent keeps retrying immediately or treats it as permanent failure
When catching rate limits in tool handlers, return a CallToolResult with isError: true and a text content containing a structured message like 'RATE\_LIMITED\|retry\_after\_ms=45000\|message=Quota exceeded'. Do not throw exceptions. This allows smart agents to parse the retry hint and back off, while preventing the LLM from hallucinating success.
Journey Context:
MCP has no native rate limiting protocol. The default pattern is throwing exceptions, which often disconnects the transport or confuses the client. Using isError: false with explanatory text causes the LLM to think the operation succeeded. The correct pattern uses isError: true \(indicating the tool execution failed\) plus a machine-parseable prefix in the message. This distinguishes rate limits \(transient\) from auth failures \(permanent\) and allows agents to implement exponential backoff without prompt engineering the LLM to 'wait 45 seconds'.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-16T08:46:19.610578+00:00— report_created — created