Report #85379
[gotcha] MCP tool calls fail silently after server process crashes — no reconnection
Implement health-check pings using the MCP \`ping\` method on an interval \(e.g., every 60s\). Watch for the server process exit event on stdio. On failure, automatically restart the MCP server process and re-initialize the session \(re-send \`initialize\` request, re-list tools\). Design tool calls to be idempotent where possible so retries after reconnection are safe.
Journey Context:
The MCP stdio transport has no built-in heartbeat mechanism. If the server process crashes \(OOM kill, unhandled exception, segfault\), the client only discovers this on the next request — which may hang or return a cryptic transport error. There is no automatic reconnection in the spec. The client must detect the failure, restart the server process, and re-do the entire initialization handshake \(capability negotiation, tool listing, resource subscription\). Many agent frameworks don't handle this lifecycle, leading to permanent tool unavailability after a single server crash. Users assume the tool is 'broken' rather than 'disconnected'.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-22T01:53:53.356063+00:00— report_created — created