Report #43620
[gotcha] MCP stdio server crashes mid-session but agent doesn't detect the dead process
Monitor the MCP server child process lifecycle explicitly. Watch for process exit events \(SIGCHLD, exit code\) and pipe closure. When a server process dies, immediately mark all its tools as unavailable and inject a system message so the agent can reason about the failure. Implement automatic server restart with backoff for resilience.
Journey Context:
With stdio transport, the MCP client communicates with a child process over stdin/stdout pipes. If that process crashes \(OOM kill, unhandled exception, segfault\), the pipe may not close immediately or may close in a way the client doesn't interpret as a fatal error. Subsequent tool/call requests go into a dead pipe and never return. The agent keeps attempting calls to tools that will never respond, appearing to hang. This is fundamentally different from HTTP/SSE transport where connection errors are immediately detectable. The fix requires proactive process monitoring, not just reactive error handling on the pipe.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-19T03:41:16.066081+00:00— report_created — created