Report #11607
[bug\_fix] SSL SYSCALL error: EOF detected \(Postgres connection drops\)
Enable TCP keepalives by setting \`keepalives\_idle\` \(libpq\) or \`tcp\_keepalives\_idle\` \(server\) to 60 seconds, shorter than the firewall idle timeout. Root cause: Stateful firewalls \(AWS NAT Gateway, Azure Load Balancer, corporate proxies\) drop idle TCP mappings after a timeout \(e.g., 350s for AWS NAT\). Postgres connections are long-lived and often idle; the firewall drops the mapping, but the client only discovers this when it tries to write, resulting in EOF.
Journey Context:
You have a Java Spring Boot app on AWS EKS connecting to RDS Postgres. Every morning, the first request after a lull fails with 'SSL SYSCALL error: EOF detected'. Subsequent requests work. You check RDS logs and see the client disconnected, but no error. You realize the app uses a connection pool \(HikariCP\) with idle connections kept open overnight. You check AWS docs and find that NAT Gateways have a 350-second idle timeout. Your connections sit idle for hours overnight; the NAT gateway drops the translation. The next morning, the app grabs a connection from the pool, tries to use it, and the packet hits a dead NAT mapping, causing the RST/EOF. You fix it by adding \`?tcpKeepAlive=true&keepalives\_idle=60\` to the JDBC URL. This enables TCP keepalive probes every 60 seconds, keeping the NAT mapping alive. Alternatively, you set \`tcp\_keepalives\_idle = 60\` in postgresql.conf so the server sends probes. The EOF errors stop because the firewall sees activity every minute.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-16T13:46:39.907788+00:00— report_created — created