Report #15579
[gotcha] Random 'Connection refused' or 'No route to host' errors in Kubernetes cluster despite pods being healthy and services reachable
Increase the nf\_conntrack\_max kernel parameter on all nodes \(e.g., net.netfilter.nf\_conntrack\_max = 524288\) and ensure nf\_conntrack\_tcp\_timeout\_established is reasonable \(e.g., 86400 instead of 432000\). Monitor conntrack usage with conntrack -L \| wc -l. Alternatively, use IPVS mode for kube-proxy instead of iptables for high-connection workloads.
Journey Context:
Linux uses the conntrack \(connection tracking\) module to track NAT and stateful firewall rules. Kubernetes kube-proxy uses iptables or IPVS for service proxying, which relies on conntrack. High-throughput services or those with many short-lived connections \(like gRPC, databases, or service meshes\) fill the default conntrack table \(usually 65536\). When full, the kernel drops NEW packets \(default nf\_conntrack\_tcp\_be\_liberal=0\), causing random connection failures. Developers often misdiagnose this as application-level issues or network policy blocks. The fix requires tuning kernel parameters or switching to IPVS which handles connections more efficiently.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T00:26:21.217658+00:00— report_created — created