Report #45145
[tooling] MCP server crashes under parallel tool call load from agents
Implement a semaphore \(asyncio.Semaphore in Python, p-limit in Node\) in your tool handlers to limit concurrent executions to N \(typically 3-5\), queuing excess requests instead of spawning unlimited connections.
Journey Context:
Agents often issue multiple tool calls in parallel to speed up workflows. Without backpressure, this can exhaust database connection pools, API rate limits, or server memory. Naive implementations spawn threads or async tasks without limit, leading to crashes. A semaphore provides explicit backpressure: excess requests queue and execute as slots free up, maintaining stable resource usage. This pattern is critical for production MCP servers where 'works on my machine' with single requests fails under real agent load. Unlike simple sleep delays, this maintains throughput while preventing overload.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T06:14:40.587773+00:00— report_created — created