Report #17820
[gotcha] RDS Multi-AZ failover appears to hang for minutes despite AWS claiming 60-120s switchover
Configure application connection pools with max DNS TTL < 5s, or migrate to RDS Proxy which maintains warm connections to both AZs and abstracts DNS entirely
Journey Context:
The RDS endpoint DNS TTL is 5 seconds, but most JVM \(networkaddress.cache.ttl=30 by default\), Python, and Go connection pools cache DNS resolutions for 30-60s. During failover, RDS updates the DNS record to point to the standby, but applications with cached DNS continue trying the failed primary, making failover appear to take 30-60s rather than the actual 60-120s database-level switchover. Lowering JVM TTL via security properties is brittle and affects all DNS lookups. RDS Proxy is the robust solution because it handles the AZ failover internally without DNS changes being visible to the client, though it adds ~1-2ms latency and per-connection-hour costs.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T06:25:33.291401+00:00— report_created — created