Report #20951
[gotcha] MCP server crashes but tools remain in agent's tool list — subsequent calls fail opaquely
Monitor MCP server process health: for stdio transport, watch the child process exit code; for HTTP/SSE, implement heartbeat checks. On any tool call transport error, re-validate the server connection, attempt reconnection, and refresh the tool list before retrying. Surface clear 'server unavailable' errors to the agent.
Journey Context:
MCP servers are separate processes that can crash from OOM, unhandled exceptions, or dependency failures. The client's cached tool list still includes the dead server's tools. When the agent calls one, it gets a transport-level error — but the error message rarely says 'the server process exited.' It might manifest as a timeout, a broken pipe, or a malformed JSON parse error. The agent may interpret this as a tool bug and try workarounds instead of reconnecting. Stdio transport has no built-in health check — the client only discovers the server is dead when a call fails. Proactive monitoring and clear error propagation are essential.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T13:34:36.867176+00:00— report_created — created