Today, application deployments often use multiple container instances in parallel. Kubernetes (K8s) is the most popular platform for orchestrating and managing these container clusters at scale.
One of the main advantages of using Kubernetes as a container orchestrator is the dynamic scaling of container pods. To enable dynamic scaling, Kubernetes supports three primary forms of autoscaling:
Horizontal Pod Autoscaling (HPA)
A Kubernetes functionality to scale out (increase) or scale in (decrease) the number of pod replicas automatically based on defined metrics.
Vertical Pod Autoscaling (VPA)
A Kubernetes functionality to right-size the deployment pods and avoid resource usage problems on the cluster. VPA is more related to capacity planning than other forms of K8s autoscaling.
A type of autoscaling commonly supported on cloud provider versions of Kubernetes. Cluster Autoscaling can dynamically add and remove worker nodes when a pod is pending scheduling or if the cluster needs to shrink to fit the current number of pods.
In this article, we’ll take an in-depth look at the first of these three, HPA. Specifically, we’ll explore what horizontal pod autoscaling is, how it works, and provide some examples of using HPA for autoscaling application pods on Kubernetes clusters.
We cover the Kubernetes autoscaling limitations in the last section of this article. For now, it’s worth noting that Kubernetes autoscaling can still result in resource waste. One reason is that users often request (or reserve) more CPU and memory resources for individual containers than they use. A second reason is that the Kubernetes autoscaling functionality does not account for disk I/O and network and storage usage resulting in poor allocation. Another reason is that accurate resource optimization requires advanced data aggregation methods and machine learning technology provided by advanced commercial software not yet available in open source form.
Horizontal Pod Autoscaler (HPA) Overview
In Kubernetes, you can run multiple replicas of application pods using ReplicaSet (Replication Controller) by configuring the number of replica counts in the Deployment object. Manually setting the number of pod replicas to a fixed predetermined number might not meet the application workload demand over time. Therefore, to optimize and automate this process, Kubernetes provides a new resource called horizontal pod autoscaler HPA.
HPA can increase or decrease pod replicas based on a metric like pod CPU utilization or pod Memory utilization or other custom metrics like API calls. In short, HPA provides an automated way to add and remove pods at runtime to meet demand. Note that HPA works for the pods that are either stateless or support autoscaling out of the box. Workloads that can’t run as multiple instances/pods cannot use HPA.
How HPA Works
In every Kubernetes installation, there is support for an HPA resource and associated controller by default.
The HPA control loop continuously monitors the configured metric, compares it with the target value of that metric, and then decides to increase or decrease the number of replica pods to achieve the target value.
The diagram shows that the HPA resource works with the deployment resource and updates it based on the target metric value. The pod controller (Deployment) will then either increase or decrease the number of replica pods running.
Without a contingency, one problem that can occur in these scenarios is thrashing. Thrashing is a situation in which the HPA performs subsequent autoscaling actions before the workload finishes responding to prior autoscaling actions. The HPA control loop avoids thrashing by choosing the largest pod count recommendation in the last five minutes.
Identify under/over-provisioned K8s resources and use Terraform to auto-optimize
Stop the load by pressing CTRL-C and then get the hpa status again , this will show things get back to normal and one replica running
Clean up the resources
Kubernetes HPA Best Practices
To take advantage of Kubernetes features like HPA, you need to develop the application with horizontal scaling in mind. Doing so requires using a microservice architecture to add native support for running parallel pods.
Use the HPA resource on a Deployment object rather than directly attaching it to a ReplicaSet controller or Replication controller.
Use the declarative form to create HPA resources so that they can be version-controlled. This approach helps better track configuration changes over time.
Make sure to define the resource requests for pods when using HPA. Resource requests will enable HPA to make optimal decisions when scaling pods.
Free 45 minute K8s optimization consulting if you run more than 1,000 containers
HPA can’t be used along with Vertical Pod Autoscaler based on CPU or Memory metrics. VPA can only scale based on CPU and memory values, so when VPA is enabled, HPA must use one or more custom metrics to avoid a scaling conflict with VPA. Each cloud provider has a custom metrics adapter to enable HPA to use custom metrics.
HPA only works for stateless applications that support running multiple instances in parallel. Additionally, HPA can be used with stateful sets that rely on replica pods. For applications that can’t be run as multiple pods, HPA cannot be used.
HPA (and VPA) don’t consider IOPS, network, and storage in their calculations, exposing applications to the risk of slowdowns and outages.
HPA still leaves the administrators with the burden of identifying waste in the Kubernetes cluster created by the reserved but unused requested resources at the container level. Detecting container usage inefficiency is not addressed by Kubernetes and requires third-party tooling powered by machine learning.