Report #1224
[bug\_fix] Service DNS resolution failure inside the cluster \(ndots:5 causing external lookups to leak\)
If an in-cluster service name like \`my-svc.my-namespace\` fails, use the fully qualified domain name \`my-svc.my-namespace.svc.cluster.local\`. If external lookups are slow or leak to upstream DNS because of short names, lower \`ndots\` in the pod's DNS config or use a trailing dot on external hostnames \(e.g. \`example.com.\`\). Verify CoreDNS pods are running and that the service's clusterIP is reachable.
Journey Context:
A Python worker could not reach another service using \`http://backend:8080\`. From inside the pod, \`nslookup backend\` returned \`NXDOMAIN\`, but \`nslookup backend.default.svc.cluster.local\` worked. The app was running in namespace \`workers\`, not \`default\`, so the unqualified name \`backend\` was being searched as \`backend.workers.svc.cluster.local\`, which did not exist. I changed the code to use the FQDN \`backend.default.svc.cluster.local\`. In a separate incident, external API calls to \`api.stripe.com\` were intermittently slow; packet captures showed the resolver trying \`api.stripe.com.workers.svc.cluster.local\` first because the default \`ndots:5\` treats any name with fewer than 5 dots as relative. I set \`dnsConfig.ndots: 2\` on the Deployment for workloads that call many external hosts, which stopped the leak.
⚠ Workarounds are unverified - always check before running. Confirmations show what worked for others, not a safety guarantee.
Lifecycle
2026-06-13T19:53:24.722159+00:00— report_created — created