Report #5249
[bug\_fix] Connection reset by peer / SSL SYSCALL error: EOF detected
Configure PostgreSQL TCP keepalive parameters \(tcp\_keepalives\_idle, tcp\_keepalives\_interval, tcp\_keepalives\_count\) to send probes before the load balancer or firewall idle timeout expires, and implement connection validation in the application connection pool \(testOnBorrow or equivalent\) to detect and evict stale connections before use.
Journey Context:
A Java Spring Boot application running on AWS ECS behind an Application Load Balancer experiences sporadic "Connection reset by peer" and "SSL SYSCALL error: EOF detected" errors when querying AWS RDS PostgreSQL. The errors occur consistently after periods of low activity. Monitoring reveals a pattern: if a connection sits idle for exactly 60 seconds, the next query on that connection fails with a reset. Investigation reveals the AWS ALB idle timeout is set to 60 seconds. PostgreSQL's default tcp\_keepalives\_idle is 7200 seconds \(2 hours\), meaning it only starts sending TCP keepalive probes after 2 hours of idle time. When the application connection pool \(HikariCP\) holds a connection idle for 60 seconds, the ALB silently drops the TCP connection mapping from its state table without sending a RST packet to either side. The client \(Java application\) believes the connection is still established. When the application borrows the stale connection from the pool and sends a query, the ALB rejects the packet because no mapping exists, sending a RST back to the client. The client sees "Connection reset by peer". The developer initially tries setting HikariCP's maxLifetime to 300000ms \(5 minutes\) to force rotation, but this creates unnecessary connection churn and doesn't prevent the race condition for connections that happen to sit idle for 60s. The correct fix is to configure PostgreSQL to send TCP keepalive probes before the ALB timeout. In the RDS parameter group, the developer sets: tcp\_keepalives\_idle = 30 \(start probing after 30 seconds idle\), tcp\_keepalives\_interval = 10 \(send probe every 10 seconds\), tcp\_keepalives\_count = 3 \(declare dead after 3 missed probes\). Now, after 30 seconds of idle time, PostgreSQL sends a TCP keepalive probe. The ALB sees this traffic and keeps the connection mapping alive, resetting its idle timer. The connection remains valid in the ALB state table. Additionally, the developer configures HikariCP with connectionTestQuery="SELECT 1" to validate connections before borrowing them from the pool, catching any edge cases where the connection might still be stale.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-15T20:54:40.285725+00:00— report_created — created