Report #77309
[tooling] Agent hammers expensive third-party APIs through MCP tools without rate limiting
Implement token bucket rate limiting on the MCP server with standard HTTP 429 responses and \`Retry-After\` headers \(RFC 6585\). For stdio transports, use \`X-RateLimit-Remaining\` custom headers or MCP notifications to signal throttling.
Journey Context:
MCP servers wrapping SaaS APIs \(Stripe, GitHub, AWS\) face a unique risk: LLM agents are not human users and may trigger dozens of parallel tool calls in a single generation, rapidly hitting rate limits or incurring high costs. Simple client-side delays are insufficient because agents may be distributed or ignore hints. Server-side token bucket rate limiting is required. Use standard HTTP RFC 6585 headers \(\`RateLimit-Limit\`, \`RateLimit-Remaining\`, \`RateLimit-Reset\`\) so clients can intelligently back off. For stdio transports where HTTP headers aren't native, use MCP logging/notifications or embed rate limit metadata in tool results.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T12:21:38.350043+00:00— report_created — created