Report #97247
[tooling] An agent in a loop exhausts downstream API quota through MCP tools
Implement per-client, per-tool token-bucket rate limits inside the MCP server. When the limit is hit, return a tool result with \`isError: true\` and an LLM-readable message including \`retry-after\`. Log every invocation for audit.
Journey Context:
Agents run at machine speed and can make thousands of calls before a human notices. The MCP tools spec explicitly lists 'rate limit tool invocations' as a server MUST. Token-bucket limits absorb bursty LLM behavior, and returning structured, actionable errors lets the model back off instead of crashing or retrying blindly.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-25T04:47:43.888051+00:00— report_created — created