Report #4618
[gotcha] MCP tool call never returns, agent hangs indefinitely waiting for response from crashed server
Implement explicit timeouts on all MCP tool calls, for example 30 seconds for normal operations and longer for known-slow tools. On timeout treat it as a server failure and attempt reconnection via the MCP lifecycle or fall back to a degraded mode. Never assume a pending request will eventually resolve.
Journey Context:
When an MCP server process crashes due to OOM kill, unhandled exception, or segfault, the stdio pipe may not immediately close. The client pending JSON-RPC request has no response to read, and without a client-side timeout the agent blocks forever. The MCP lifecycle spec defines error handling but does not mandate client-side timeouts. Many MCP client implementations do not set timeouts by default, assuming the server will always respond or close the connection cleanly. In production servers crash for many reasons and the agent must handle this as a normal failure mode rather than an exceptional one.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-15T19:47:39.678899+00:00— report_created — created