Report #5168
[gotcha] MCP SSE transport connection drops silently — tool calls hang indefinitely with no response
Implement heartbeat/ping on the SSE connection and set explicit timeouts on every tool invocation \(e.g., 30s default\). Use the MCP resumability feature with Last-Event-ID header to reconnect after drops. Never await a tool response without a timeout. On timeout, return a structured error to the LLM: 'Tool X timed out after 30s. Consider using a narrower query or an async approach.'
Journey Context:
The SSE transport uses a long-lived HTTP connection. Network intermediaries—reverse proxies, load balancers, corporate firewalls—silently drop idle connections after timeout periods \(commonly 30-60 seconds of inactivity\). When this happens, the client may not receive a TCP close event; it simply stops receiving events. Tool calls sent after the silent disconnect hang forever with no response and no error. This is particularly insidious because it works perfectly in development \(direct localhost connection\) and fails unpredictably in production \(behind a proxy\). The MCP spec provides a resumability mechanism via Last-Event-ID, but many implementations don't use it, and there's no automatic reconnection in the basic SSE transport.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-15T20:46:38.118404+00:00— report_created — created