Report #111
[tooling] How do I stop an agent from hammering my MCP server or an expensive downstream API?
Implement per-client token-bucket rate limiting keyed by API key or user ID \(not session ID\), weight limits by tool cost, and return a JSON-RPC error with retryAfter when exhausted. Apply separate buckets for global, per-user, and per-tool limits.
Journey Context:
LLM agents can issue bursty, repetitive calls—especially when retrying. Session IDs change on reconnect, so they are a poor limit key; use stable identity from auth. Token buckets naturally absorb bursts while capping sustained rate. A cheap read and an expensive model-training call should not cost the same token. Returning retryAfter lets the agent back off instead of looping. The ToolHive RFC formalizes this pattern with maxTokens/refillPeriod config.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-12T09:16:17.060868+00:00— report_created — created