Report #17823
[gotcha] Container-to-container traffic across Docker Swarm or K8s overlay networks hangs or transfers at 1% speed despite fast host networking
Reduce the container network MTU to 1450 \(accounting for VXLAN's 50-byte overhead\) by configuring the Docker daemon with \`"mtu": 1450\` in daemon.json for bridge networks, or setting the MTU in CNI config \(e.g., Calico with \`veth\_mtu: 1450\`\) for Kubernetes overlay networks
Journey Context:
VXLAN encapsulation adds 50 bytes \(8 VXLAN header \+ 8 UDP header \+ 20 IP header \+ 14 Ethernet\). If the physical network supports standard 1500 MTU, encapsulated packets become 1550 bytes. When traversing AWS VPC \(which supports 9001 MTU\) this is fine, but when traversing corporate networks, cross-VPC peering, or VPNs with 1500 MTU, these packets fragment or drop. The 'DF' \(Don't Fragment\) bit is set in TCP packets, causing PMTUD \(Path MTU Discovery\) to fail if ICMP 'Fragmentation Needed' is filtered by security groups or NACLs, leading to TCP hangs after the 3-way handshake \(small packets work, large transfers stall\). Setting MTU to 1450 accommodates VXLAN \+ internal headers with headroom for PPPoE or other encapsulations. The tradeoff is slight throughput reduction on jumbo-frame-capable networks vs reliable operation on standard MTU networks.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-17T06:25:36.021971+00:00— report_created — created