Thanks to its scalability and extensibility, Kubernetes (K8s) adoption has surged in recent years. Additionally, public cloud providers have made K8s even more accessible with managed K8s services. When correctly configured, Kubernetes ensures high availability and reliability for your container workloads. However, getting K8s configuration right isn’t always easy.
Most teams tend to over-provision underlying hardware or cloud-based node resources for their Kubernetes clusters without proper guidance. This poor node capacity management leads to a waste of resources and increased costs. Under-provisioning node capacity is more dangerous because it leads to application slowdowns and outages.
This article will review how to effectively manage underlying node capacity for Kubernetes clusters and introduce a solution specifically designed for solving this complex challenge.
We first summarize the key points so that you can glean all of the aspects of node capacity management before delving into each separately. Effective node capacity management consists of three high-level categories:
1. Kubernetes Cluster Level: These are the things to keep in mind when bootstrapping your cluster
|Max allowed pods||Determine the number of allowed pods on a node. Defaults to 110|
|Nodes Network CIDR||Determines the number of nodes that you can have in a cluster|
|Pod Network CIDR||Determines the total number of pods that you can have in a cluster|
2. Kubernetes Resource Level: These are the things to keep in mind when configuring your Deployments and Statefulsets
|Inter Pod Affinity|
|Pod Node Affinity||Ability to assign pods to a particular type of node(s)|
|Pod Resource Configuration||Ability to configure CPU/Mem Request and Limits|
3. Node Level: These are the things to keep in mind when deciding the underlying node type for a node pool in your cluster
|Cloud Resource||Choosing the correct instance type based on|
|Optimization||CPU/Memory/Network requirements of your application|
|Cloud Instance Pricing||Keep in mind each instance is billed differently and this can have a direct impact on COGS|
Now, let’s take a closer look at each aspect of K8s node capacity management. We’ll start with considerations at the cluster level.
Node capacity for any Kubernetes cluster highly depends on how the cluster was bootstrapped. There are three main things you should keep in mind while provisioning your cluster:
Max Allowed Pods are the total number of pods you can schedule on a node in your cluster. Upstream Kubernetes defaults Max Allowed Pods to 110, but you can change the value during cluster bootstrapping. Different cloud providers have different default settings as well.
|Cloud Provider||Default Setting||CLI Flag to configure|
|Google Kubernetes Engine (GKE)||110 (GKE Standard) 32 (GKE Auto-pilot)||--default-max-pods-per-node|
|Azure Kubernetes Service (AKS)||30 (Azure CNI) 110 (Kubenet)||--max-pods|
|AWS Elastic Kubernetes Service (EKS)||Depends upon the network interface of the instance. Ranges from (4 to 737) Limits can be read here||N/A|
The table above shows the default number of pods for different hosted Kubernetes services and the CLI flag to use if you want to change this setting when provisioning Kubernetes clusters.
Note that in the case of AWS EKS, there is no CLI flag available since the max number of pods on a node is dependent on the instance type. Because AWS EKS clusters are configured by default using AWS CNI, the instance type dictates the number of Elastic Network Interfaces, which defines the max number of pods. The AWS CNI assigns network interfaces of the node to the pods. If you want to get around this, you should configure your EKS cluster to use a different CNI plugin such as Calico.
For each Kubernetes cluster, you need to define an IP CIDR block for the pods. The network plugin uses this CIDR to assign IP addresses to the Pods. This setting works closely with the previous Max Allowed Pods setting. Let’s take a look at a few examples to make the concept clear.
First, let’s walk through an example using Pod CIDR settings for Azure Kubernetes Service (AKS).
By default, AKS assigns 250 pods on each node. Therefore, a cluster with 10 nodes can have up to 2500 Pods.
Azure allocates a subnet with 254 hosts for each node (i.e.,/24). Therefore, we have four spare IP addresses in case we have pods being created and terminated simultaneously. Of course, you can override all these settings with CLI flags.
Let’s look at another example, this time for Google Kubernetes Engine (GKE). Suppose you configure a cluster to have 32 Pods per node. GKE will assign that node /26 block for Pod IPs (i.e., 64 IPs ). Next, you configure the Pod IP CIDR as /17. This setting means there are a total of 32,766 IPs available. As a result, you have a cluster that can support 511 (32766 / 64) Nodes.
A 500 node cluster is a massive cluster, and if you provision resources in a single VPC -- or have a network structure where you need to create VPC Peerings -- IP addresses are a precious resource. If your application does not scale to require so many nodes, you will waste IP addresses. In addition to IP waste, you will have significant resource waste if the application cannot scale accordingly.
Closely related to the Pod CIDR setting, the Node Network CIDR defines the subnet for the cluster nodes. To understand how it works, let’s expand on the previous GKE example.
If you configured max allowed pods and Pod CIDR so that your cluster can support up to 510 nodes, then the Node CIDR comes out to /23. Therefore, you will need to provision your GKE cluster in a network with CIDR as /23 or larger (e.g.,/22 or /21).
This requirement means you are reserving a lot of IP addresses in a single subnet. If not configured correctly, it could lead to a shortage of IP addresses for non-cluster hosts. Also, this can lead to scenarios where creating a VPC peering with other VPCs becomes difficult as there is a greater chance of overlap due to the large address space.
As we can see, carefully curating network ranges is critical because a single mistake can lead to significant waste. Fortunately, there is a solution that employs complex machine learning algorithms to perform these tasks for you. We will learn about it in a bit, but first, let’s look at a few Kubernetes resource-level settings that can help you reduce waste.
Several resource-level considerations can impact K8s node capacity management. Let’s take a look at those next.
Kubernetes provides the ability to schedule pods depending upon what pods are scheduled on a particular node. Labels and the Kubernetes scheduler handle all this magic. Deployments, Statefulsets, Jobs, and CronJobs expose settings using which you can ensure that pods with matching labels or a set of labels are co-located on the same node or scheduled to different nodes.
If configured incorrectly, this can lead to scenarios where a particular node is overloaded while another node in the cluster is underutilized.
Therefore, setting Pod Affinities correctly is extremely important. You can read more about them here
Like Inter Pod Affinity / Anti-Affinity discussed above, we can use node labels to decide what pods get scheduled. For example, you can label nodes that have high CPU as cpu=high-cpu using the following command:
Then provide this label in the pod spec for your deployment. With this configuration, all pods in that deployment will get assigned to the nodes with high CPU. Therefore, nodes with this label are ideal for CPU-intensive workloads.
Node affinity also comes in handy when you schedule pods to nodes provisioned in a particular availability zone or region for your Kubernetes cluster.
While Node affinity helps accomplish some advanced scheduling scenarios to maintain service availability in all availability zones or schedule pods depending upon underlying node resources, you should use it with caution. Inefficient node labeling could cause pods to be scheduled incorrectly and therefore leading to resource waste.
Kubernetes provides the ability to configure CPU and memory requests and limits for pods. While this is a popular feature, it is one of the top reasons for resource waste. It is tricky to get right. The following table shows what could happen if these requests and limits are misconfigured.
|Setting||Very High||Very Low|
|CPU Limit||The pod may take up too much CPU, thereby throttling other pods||The Pod gets throttled very frequently|
|CPU Request||The pod is difficult to schedule||The Pod is scheduled quickly but could go to a node with minimal burst capacity|
|Memory Limit||The pod may take up too much memory leaving less room for other pods||The Pod is scheduled easily but could go to a node with minimal burst capacity and therefore easily terminated by Out of Memory Killer (OOM)|
|Memory Request||The pod is difficult to schedule||The pod gets easily terminated by Out of Memory Killer (OOM)|
Kubernetes resource misconfiguration could lead to a waste of resources due to low consumption of provisioned virtual machines (VM) serving as nodes.
The cluster nodes may be virtual machines provisioned in a public cloud or a data center. Node-level optimization is critical since the size and number of nodes determine the cluster’s financial costs.
Before we delve into node-level capacity management, we must remind our readers that the misconfiguration of pod-level requests and limits is the primary cause of resource waste at the node level. In other words, the waste created within the pods propagates down and creates waste in the cluster nodes. Therefore, administrators should analyze node and pod capacity in tandem, not in isolation.
The complexity of node capacity optimization warrants using specialized tools that leverage sophisticated techniques, advanced analytics, and automation to help you select the correct size and type of nodes for your cluster (see Densify).
Let’s take a look at the top three considerations for Kubernetes node capacity management.
Kubernetes is great at offering granular configuration control for CPU and memory resources but doesn’t regulate the network and disk I/O resource usages. This lack of control means that network and disk can form performance bottlenecks that go unnoticed until end-users complain about a slow user interface, or worse, face an outage.
The nodes have a maximum network port capacity of 100 Megabit per second (Mbps), 1 Gigabit per second (Gbps), or 10 Gbps. Administrators must configure the network interface card (NIC) throughput according to historic utilization measurements to avoid creating data transfer bottlenecks.
The storage volumes on the nodes also have a maximum throughput capacity measured in input and output per second (IOPS). The IOPS capacity is a common cause of application slowdowns and outages, especially for data-intensive applications involving databases or inter-application messaging.
Measuring resource usage is complex because the sub-second data samples must aggregate into a higher level of data granularity (minute-level or hourly) before being used for capacity analysis. Unsophisticated methods used for rolling up performance data lead to application performance problems that are difficult to diagnose. For example, you may configure a storage volume in a public cloud to have a maximum amount of I/O per second (IOPS) based on hourly data analysis and overlook (while averaging the data) short spikes of IOPS lasting 100 milliseconds that can cause application performance problems.
There are over one million permutations available for configuring an EC2 (virtual machine service) offered by Amazon Web Services (AWS). The EC2s serve as nodes to host a Kubernetes cluster in AWS. The selection parameters include:
Some nodes may have GPU cores in addition to traditional CPUs requiring additional considerations. Also, keep in mind that AWS launches new EC2 types each month and new generations of hardware platforms for existing types of EC2. Once again, this level of complexity requires specialized tools that combine advanced analytics with automation to help match the right type and size to each workload profile.
Managing Kubernetes Node Capacity is not a trivial task. It depends on multiple factors that require automated analysis. Densify helps solve the Kubernetes node capacity management problem by using advanced analytics to recommend an optimal configuration for Kubernetes resources like pods and nodes and eliminate performance bottlenecks and financial waste. Request a free trial here!
Follow our LinkedIn monthly digest to receive more free educational content like this.Follow LinkedIn K8s digest