Kubernetes Cluster Autoscaler is one of the most popular automation tools for Kubernetes hardware capacity management. It is supported by the major cloud platforms and can streamline the Kubernetes (K8s) cluster management process.
To help you get started with Kubernetes Cluster Autoscaler, in this article, we’ll explain how it handles capacity management by adding and removing worker nodes automatically, its requirements, and provide examples of how to use it in practice.
Cluster AutoScaling Overview
As new pods are deployed and replica counts for existing pods increase, cluster worker nodes can use up all their allocated resources. As a result, no more pods can be scheduled on existing workers. Some pods can go into a pending state, waiting for CPU and memory and possibly creating an outage. As Kubernetes admin, you can manually solve this problem by adding more worker nodes to the cluster to enable scheduling of additional pods.
The problem is this manual process is time-consuming and scales poorly. Fortunately, Kubernetes Cluster Autoscaler can solve this problem by automating capacity management. Specifically, Cluster Autoscaler automates the process of adding and removing worker nodes from a K8s cluster.
Most cloud providers support Cluster Autoscaling, but it’s not supported for on-prem self-hosted K8s environments. Cluster Autoscaling is a “cloud-only” feature because on-prem deployments lack the APIs for automatic virtual machine creation and deletion required for the autoscaling process.
By default, Cluster Autoscaler is installed on most K8s cloud installations. If Cluster Autoscaler isn’t installed in your cloud environment but is supported, you can manually install it.
Cluster AutoScaling Requirements and Supported Platforms
Support for Cluster Autoscaler is available on these Kubernetes platforms:
Google Kubernetes Engine (GKE)
Azure Kubernetes (AKS)
Elastic Kubernetes Service (EKS)
as well as several other less popular K8s cloud platforms.
Each cloud provider has its own implementation of Cluster Autoscaler with different limitations.
You can enable Cluster Autoscaler during cluster creation using a platform-specific GUI or CLI method. For example, on GKE the command below enables Cluster Autoscaler on a multi-zone cluster with a one-node per zone minimum and four-node per zone maximum:
How Cluster Autoscaling Works
Kubernetes scheduler dynamically places pods on worker nodes using a best-effort QoS strategy. For Cluster Autoscaler to work as expected and applications to get the underlying host resources they need, resource requests and limits must be defined on pods. Without resource requests and limits, Cluster Autoscaler can’t make accurate decisions.
Cluster AutoScaler periodically checks the status of nodes and pods and takes action based on node utilization or pod scheduling status. When Cluster Autoscaler detects pending pods on the cluster, it will add more nodes until pending pods are scheduled or the cluster reaches the max node limit. Cluster Autoscaler will remove extra nodes if node utilization is low and pods can move to other nodes.
Identify under/over-provisioned K8s resources and use Terraform to auto-optimize
Now that we understand how Cluster Autoscaler works, we can get started using it in practice. This section will walk through an example application deployment on Google Cloud Platform using Google Kubernetes Engine (GKE) with Cluster Autoscaler enabled with resource requests and limits defined.
We’ll scale the application by increasing the number of replicas until the autoscaler detects the pending pods. Then we’ll see the autoscaler events with a new node added. Finally, we will scale down the replicas for the autoscaler to remove the extra nodes.
To begin, create a demo cluster with 3 worker nodes
Enable autoscaling on the cluster
Get the number of initial nodes created
Create the example deployment application with resource requests and limits defined so that Kubernetes Scheduler can allocate pods on nodes with required capacity and Cluster Autoscaler can allocate more nodes when needed.
Free 45 minute K8s optimization consulting if you run more than 1,000 containers
Get the number of nodes , they are still not scaled down yet , as the autoscaler takes time to detect unused nodes
Let's wait for a few minutes (5–15 minutes) and see the scale down events related to node removal
Check the number of nodes, latest update
Now we see that the number of nodes is scaled back down to the initial number of nodes in the cluster.
Cluster Autoscaler Limitations
Cluster Autoscaler is a helpful tool but has limitations. This section summarizes its main shortcomings and points to commercial tools that can help you overcome them.
Cluster Autoscaler is not supported on on-premise environments until an autoscaler is implemented for on-premise deployments.
Scaling up is not immediate. Therefore, a pod will be in a pending state for a few minutes until a new worker is added.
Some cluster workers may have other dependencies, such as local volumes bindings from other pods. As a result, a node may be a candidate for removal but can’t be removed by Cluster Autoscaler.
Cluster Autoscaler works based on resource requests, not actual usage. This fact can lead to mis-allocating nodes if resource requests and limits are not properly calculated and set. This issue is critical because you can waste resources in your cluster under the false impression that autoscaling is addressing them. Supplemental tools can help analyze the effective efficiency throughout the cluster (container, pod, namespace, node) to avoid pockets of waste due to misconfiguration.
Cluster Autoscaler adds additional nodes but administrators are responsible for defining the right size for each node. The same commercial tools that help with optimizing the node size while also identifying the cluster’s wasted capacity due to unused resource requests