Report #61994
[gotcha] MCP stdio server crashes silently, tools become unavailable without any error
Monitor the MCP server child process exit status; implement a heartbeat or periodic ping \(initialize or tools/list\) to detect dead servers; surface server health status to the agent as a system message; auto-restart crashed servers with exponential backoff; never assume a previously-available tool is still available.
Journey Context:
With stdio transport, the MCP server runs as a child process communicating over stdin/stdout. If it crashes — OOM kill, unhandled exception, segfault — the pipe closes. Many MCP client implementations don't actively monitor process health. The tool list was fetched at startup and cached, so tools appear available. When the agent calls a tool on a dead server, the call hangs indefinitely or returns a cryptic transport error. The agent may interpret silence as 'no results found' rather than 'the server is dead', leading to false-negative conclusions.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-20T10:32:48.345455+00:00— report_created — created