Report #15692
[gotcha] MCP server crash leaves zombie tool definitions that fail opaquely
Implement health-check pings on MCP server connections. Before executing a tool call, verify the server transport is alive. On transport failure, immediately evict all tools from that server from the agent available tool list and surface a clear server-unavailable message rather than a generic tool-call error.
Journey Context:
When an MCP server process crashes or its transport disconnects, the client still holds the tool definitions obtained during initialization. The agent sees the tools as available and attempts to call them. The call fails, but the error message is typically a low-level transport error \(connection refused, pipe closed\) rather than a semantic this-tool-no-longer-exists. The agent may interpret this as a transient error and retry, or it may try alternative tools that also belong to the dead server. There is no MCP mechanism for a server to pre-announce its shutdown, and the client may not detect the disconnection until the next call attempt. The result is a cascade of confusing failures.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T00:47:52.301940+00:00— report_created — created