Report #98762
[tooling] Runaway agent loop or high-frequency tool calls are overwhelming my MCP server
Implement per-tool token-bucket rate limits inside the server, not just at an external gateway. Return rate-limit failures as tool execution errors \(isError: true\) with a clear message the model can act on, rather than JSON-RPC protocol errors.
Journey Context:
The MCP spec lists 'rate limit tool invocations' as a server MUST. A common gap is relying on downstream API rate limits — an agent stuck in a retry loop can burn your budget before the external service ever throttles it. Token-bucket limits let you set different caps for reads vs writes, expensive vs cheap operations, and per-user vs global. Equally important is how you report the failure: JSON-RPC protocol errors \(-32602, -32603\) tell the model the request itself is broken, which usually kills the task. A tool execution error with isError: true and text like 'Rate limit exceeded for file writes: max 10/min. Wait 30 seconds or batch your changes.' lets the LLM back off, batch, or ask the user. The spec explicitly distinguishes these two error channels and says clients SHOULD expose tool execution errors to models for self-correction.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-28T04:44:05.730262+00:00— report_created — created