EKS Cluster Network Architecture for Worker Nodes
AWS EKS (Elastic Kubernetes Service) is Amazon’s managed Kubernetes Service, which works like magic once provisioned. But this would be the default setup of EKS. What if you intend to customize it according to your organization's designs, compliance standards, and privacy requirements? This is where things get complicated.
I want to share my experience and issues I ran into when building a fully private EKS cluster, meaning it can only be accessed via the VPC with no internet access. Based on the textbook guidelines, I provisioned the private EKS cluster in a private VPC, and when I tried to attach nodes to my cluster, BOOM!!! it started giving networking errors. Amazon always tries to provide a very user-friendly interface to provision resources in AWS. Still, some subtle attributes in their setup can burn up your entire week, if you don’t fully understand how it works under the hood.
Today, let me explain how the EKS network is set up, maybe start with a public EKS setup and dive deep into each component and how we can achieve a fully private EKS cluster.
Two VPCs for each EKS Cluster
An EKS cluster consists of 2 VPCs. The first VPC is managed by AWS where the Kubernetes Control Plane resides within this VPC (this cannot be seen by the users). The second VPC is the customer VPC which we specify during the cluster creation. This is where we place all the worker nodes.
Cluster Endpoint Access Types
The cluster endpoint configures how the Kubernetes API server can be accessed.
- Public: The cluster endpoint is accessible from outside of your VPC (Customer Managed VPC). Worker node traffic will leave your VPC (Customer Managed VPC) to connect to the endpoint (in the AWS Managed VPC).
- Public and private: The cluster endpoint is accessible from outside of your VPC (Customer Managed VPC). Worker node traffic to the endpoint will stay within your VPC (Customer Managed VPC).
- Private: The cluster endpoint is only accessible through your VPC. Worker node traffic to the endpoint will stay within your VPC.
Public Endpoint Only
This is the default behavior of the EKS Cluster. Access to the public endpoint can be controlled with the Security Group allowing only known IP ranges to access the EKS control plan. Anyone accessing the EKS Cluster from outside (eg: using kubectl), will enter through the public endpoint, pass the security group rules and access the control plane.
Any traffic originating from the VPC (eg: worker nodes trying to communicate with the EKS control plane), would leave the VPC, pass the Security Group rules, and access the control plane. Even though the traffic leaves the VPC, it does not leave the AWS network.
For these nodes to connect to the EKS Control Plane, it at least requires one of the following:
1. Public IP address and a route to an Internet Gateway — (where nodes reside in public subnets)
2. NAT Gateway (which already has a public IP address) — (where nodes are in a private subnet and the NAT Gateway is in a public subnet)
Public and Private Endpoints
This option allows the public endpoint as explained above, but the Customer Managed VPC traffic (eg: worker nodes trying to connect to the EKS control plane) will go through the EKS-managed Elastic Network Interface (ENI) through a private endpoint.
This situation is ideal if you’d like to allow your cluster to be accessible via the internet, but you’d like to allow your worker nodes to be in a private subnet and communicate with the EKS control plane through a private endpoint.
Private Endpoint Only
This is the most secure option. But it doesn't mean that the others are insecure. With the right configurations, every setup can be made secure. With this setup, the worker nodes will talk to the EKS control plane via the EKS-managed ENI.
If you wish for someone to access the EKS cluster with kubectl, you can allow that to be done from within the VPC. No external traffic will be allowed into the cluster.
What Happens when we provision a Worker Node?
When we request a new node, it has to do the following.
- A new EC2 instance spin-up
- Install Kubelet and Kubernetes Node Agent as part of the boot process on each node
- Kubelet reaches out to the EKS Control Plane to register the node.
- Kubelet receives API commands from the control plane and regularly sends updates to the control plane on node status, capacity, etc.
VPC Configurations for EKS
Now that you understand how these different endpoint types work, let’s take a deeper look into different ways we can configure our customer-managed VPC.
The VPC networking is made up of Subnets and the specified networking configurations like routing, to control the traffic flows. So, there are different ways of configuring your VPC, and let’s take a look at each of them. In this, I’ll also explain the problem I faced with fully private VPC setups.
A VPC is made up of subnets which can either be public or private. We have the following network combinations possible in an EKS setup.
- Public Subnets only
- Public and Private Subnets
- Private Subnets only
Public Subnets only
In this setup, all the resources like load balancers and worker nodes will be installed into public subnets. This means, all these resources are accessible from the internet. I mean, they are accessible, but controlled with the right configurations.
When provisioning worker nodes within the public subnets, each EC2 instance (worker node) will be assigned a public IP on launch. This limits the number of nodes as the number of IP addresses is limited in a given network.
With this setup, you could use any cluster endpoint setup for your EKS cluster.
Public and Private Subnets
This is the widely used VPC setup for EKS, where the worker nodes reside within the private subnets and the NAT Gateway and load balancers are placed within the public subnet.
Private Subnets Only
This is what we call a fully private VPC. There is no egress traffic nor ingress traffic to/from the VPC. For this setup, only the private cluster endpoint should be enabled for your EKS cluster.
This is an uncommon architecture, but can be seen in organizations where data is super sensitive, like banks, hospitals, etc.
The important thing to note here is, that we can easily set up an EKS with a cluster endpoint and allow your worker nodes to communicate with the EKS control plane. This is managed by EKS with their EKS-managed ENIs. But for the EKS nodes to spin up, they would require access to a few other AWS services. This is a common mistake that everyone makes when provisioning a fully private VPC for EKS.
EKS nodes require access to the following AWS service in general, to be able to function within EKS
- Amazon ECR (pull-down container images)
- Amazon EC2
- Amazon S3
- Cloudwatch logs
- Amazon STS (for IRSA)
Without the above VPC endpoints, you could never create a worker node for EKS, because it requires access to these above AWS services to function as an EKS worker node. (Please note that in certain cases, you might not need all of the above. But it’s always good to have the above).
For example, ECR access is essential and that is the only place where docker images can be pulled down from a fully private VPC setup. Without this, your nodes would not be able to install the initial cluster addons and function as an EKS worker node.
That’s it. Hope this article helps someone to understand and troubleshoot any EKS-related issues and provision a much better architecture. I will do another write-up on how to provision these types of clusters using Terraform. Stay tuned.