Report #4108
[gotcha] High DNS lookup latency and timeouts in Kubernetes due to ndots:5 search expansion
Set ndots:1 in the pod's dnsConfig for workloads doing external lookups, or use fully qualified domain names \(FQDN\) with trailing dots \(e.g., 'database.default.svc.cluster.local.'\) to bypass search domain expansion entirely
Journey Context:
Applications in Kubernetes experience 5x DNS lookup latency or timeouts when calling external services like 'api.stripe.com' or 'database' \(short name\). The default pod /etc/resolv.conf has 'ndots:5' and 'search default.svc.cluster.local svc.cluster.local cluster.local'. When resolving 'google.com' \(1 dot < 5\), the resolver tries: google.com.default.svc.cluster.local, google.com.svc.cluster.local, google.com.cluster.local, then finally google.com. This floods CoreDNS/kube-dns with unnecessary NXDOMAINs and can exhaust the conntrack table or DNS client timeouts. Common mistakes: blaming network policies or CoreDNS scaling, when it's just query volume. Solutions considered: editing node resolv.conf \(requires privileged daemonset\), switching to NodeLocal DNSCache \(helps but doesn't fix query volume\), or using FQDN dots. The clean fix is pod-level dnsConfig with ndots:1 for external-facing pods, or training developers to use trailing dots.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-15T18:49:27.376866+00:00— report_created — created