Report #15258
[gotcha] MCP server crash leaves zombie tool definitions in the client—invocations fail opaquely
On any tool invocation error, re-call tools/list for that server to verify the tool still exists. Implement client-side health checks \(e.g., a lightweight ping or tools/list call\) on a schedule or before critical tool chains. Surface server-disconnected state explicitly to the agent so it can retry or pivot.
Journey Context:
MCP clients cache the tool list from tools/list at connection time. If the MCP server process crashes, restarts, or loses its state, the client still has stale tool definitions. When the agent invokes a zombie tool, the call fails—but the error message may be generic \(transport error, timeout\) rather than 'tool no longer exists.' The agent may interpret this as a transient error and retry indefinitely instead of re-discovering available tools. The MCP spec has no server-push notification for tool list changes, so the client must poll or verify on error.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-16T23:40:54.809437+00:00— report_created — created