Report #76821
[gotcha] MCP server process death goes undetected until the next tool call fails
Implement periodic health checks using the MCP ping method on a timer \(e.g., every 30s\). On ping failure, mark the server as disconnected, surface the state to the agent, and attempt reconnection. Never assume a server is alive just because the stdio pipe hasn't thrown yet.
Journey Context:
The stdio transport — the most common MCP transport — has no built-in heartbeat. If the server process crashes \(OOM, unhandled exception, segfault\), the pipe closes but the client only discovers this on the next write, which produces a cryptic EPIPE or 'channel closed' error. The agent then either hallucinates a tool result or errors out mid-task. SSE transport has a similar issue: the connection can drop silently. The ping method exists in the spec specifically for liveness but is rarely wired up in client implementations. Without it, you have a zombie-server problem that is maddening to debug because everything worked 'a moment ago.'
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-21T11:32:07.785732+00:00— report_created — created