Today, application deployments often use multiple container instances in parallel. Kubernetes (K8s) is the most popular platform for orchestrating and managing these container clusters at scale.
One of the main advantages of using Kubernetes as a container orchestrator is the dynamic scaling of container pods. To enable dynamic scaling, Kubernetes supports three primary forms of autoscaling:
In this article, we’ll take an in-depth look at the first of these three, HPA. Specifically, we’ll explore what horizontal pod autoscaling is, how it works, and provide some examples of using HPA for autoscaling application pods on Kubernetes clusters.
We cover the Kubernetes autoscaling limitations in the last section of this article. For now, it’s worth noting that Kubernetes autoscaling can still result in resource waste. One reason is that users often request (or reserve) more CPU and memory resources for individual containers than they use. A second reason is that the Kubernetes autoscaling functionality does not account for disk I/O and network and storage usage resulting in poor allocation. Another reason is that accurate resource optimization requires advanced data aggregation methods and machine learning technology provided by advanced commercial software not yet available in open source form.
In Kubernetes, you can run multiple replicas of application pods using ReplicaSet (Replication Controller) by configuring the number of replica counts in the Deployment object. Manually setting the number of pod replicas to a fixed predetermined number might not meet the application workload demand over time. Therefore, to optimize and automate this process, Kubernetes provides a new resource called horizontal pod autoscaler HPA.
HPA can increase or decrease pod replicas based on a metric like pod CPU utilization or pod Memory utilization or other custom metrics like API calls. In short, HPA provides an automated way to add and remove pods at runtime to meet demand. Note that HPA works for the pods that are either stateless or support autoscaling out of the box. Workloads that can’t run as multiple instances/pods cannot use HPA.
In every Kubernetes installation, there is support for an HPA resource and associated controller by default.
The HPA control loop continuously monitors the configured metric, compares it with the target value of that metric, and then decides to increase or decrease the number of replica pods to achieve the target value.
The diagram shows that the HPA resource works with the deployment resource and updates it based on the target metric value. The pod controller (Deployment) will then either increase or decrease the number of replica pods running.
Without a contingency, one problem that can occur in these scenarios is thrashing. Thrashing is a situation in which the HPA performs subsequent autoscaling actions before the workload finishes responding to prior autoscaling actions. The HPA control loop avoids thrashing by choosing the largest pod count recommendation in the last five minutes.
Spend less time optimizing Kubernetes Resources. Rely on AI-powered Kubex - an automated Kubernetes optimization platform
Free 60-day TrialThis section will walk through an example code showing how HPA can be configured to auto-scale application pods based on target CPU utilization. There are two ways to create an HPA resource:
This code snippet shows creating a Kubernetes deployment and HPA object to auto-scale the pods of that deployment based on CPU load. This is shown step by step along with comments.
Create a namespace for HPA testing
kubectl create ns hpa-test
namespace/hpa-test created
Create a deployment for HPA testing
cat example-app.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: php-apache
namespace: hpa-test
spec:
selector:
matchLabels:
run: php-apache
replicas: 1
template:
metadata:
labels:
run: php-apache
spec:
containers:
- name: php-apache
image: k8s.gcr.io/hpa-example
ports:
- containerPort: 80
resources:
limits:
cpu: 500m
requests:
cpu: 200m
---
apiVersion: v1
kind: Service
metadata:
name: php-apache
namespace: hpa-test
labels:
run: php-apache
spec:
ports:
- port: 80
selector:
run: php-apache
kubectl create -f example-app.yaml
deployment.apps/php-apache created
service/php-apache created
Make sure the deployment is created and the pod is running
kubectl get deploy -n hpa-test
NAME READY UP-TO-DATE AVAILABLE AGE
php-apache 1/1 1 1 22s
After the deployment is up and running, create the HPA using kubectl autoscale command. This HPA will maintain minimum 1 and max 5 replica pods of the deployment to keep the overall CPU usage to 50%
kubectl -n hpa-test autoscale deployment php-apache --cpu-percent=50 --min=1 --max=5
horizontalpodautoscaler.autoscaling/php-apache autoscaled
The declarative form of the same command would be to create the following Kubernetes resource
apiVersion: autoscaling/v1
kind: HorizontalPodAutoscaler
metadata:
name: php-apache
namespace: hpa-test
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: php-apache
minReplicas: 1
maxReplicas: 10
targetCPUUtilizationPercentage: 50
Examine the current state of HPA
Spend less time optimizing Kubernetes Resources. Rely on AI-powered Kubex - an automated Kubernetes optimization platform
Free 60-day Trialkubectl -n hpa-test get hpa
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
php-apache Deployment/php-apache 0%/50% 1 5 1 17s
Currently there is no load on the running application so the current and desired pods are equal to the initial number which is 1
kubectl -n hpa-test get hpa php-apache -o yaml
apiVersion: autoscaling/v1
kind: HorizontalPodAutoscaler
metadata:
name: php-apache
namespace: hpa-test
resourceVersion: "402396524"
selfLink: /apis/autoscaling/v1/namespaces/hpa-test/horizontalpodautoscalers/php-apache
uid: 6040eea9-0c2b-47de-9725-cfb78f17fe32
spec:
maxReplicas: 5
minReplicas: 1
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: php-apache
targetCPUUtilizationPercentage: 50
status:
currentCPUUtilizationPercentage: 0
currentReplicas: 1
desiredReplicas: 1
Now run the load test and see the HPA status again
kubectl -n hpa-test run -i --tty load-generator --rm --image=busybox --restart=Never -- /bin/sh -c "while sleep 0.01; do wget -q -O- http://php-apache; done"
If you don’t see a command prompt, try pressing enter.
Spend less time optimizing Kubernetes Resources. Rely on AI-powered Kubex - an automated Kubernetes optimization platform
Free 60-day TrialOK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!OK!
kubectl -n hpa-test get hpa
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
php-apache Deployment/php-apache 211%/50% 1 5 4 10m
Stop the load by pressing CTRL-C and then get the hpa status again , this will show things get back to normal and one replica running
kubectl -n hpa-test get hpa
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
php-apache Deployment/php-apache 0%/50% 1 5 1 20h
kubectl -n hpa-test get hpa php-apache -o yaml
apiVersion: autoscaling/v1
kind: HorizontalPodAutoscaler
metadata:
name: php-apache
namespace: hpa-test
resourceVersion: "402402364"
selfLink: /apis/autoscaling/v1/namespaces/hpa-test/horizontalpodautoscalers/php-apache
uid: 6040eea9-0c2b-47de-9725-cfb78f17fe32
spec:
maxReplicas: 5
minReplicas: 1
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: php-apache
targetCPUUtilizationPercentage: 50
status:
currentCPUUtilizationPercentage: 0
currentReplicas: 1
desiredReplicas: 1
lastScaleTime: "2021-07-04T08:22:54Z"
Clean up the resources
kubectl delete ns hpa-test --cascade
namespace "hpa-test" deleted
Spend less time optimizing Kubernetes Resources. Rely on AI-powered Kubex - an automated Kubernetes optimization platform
Free 60-day TrialLearn more about using machine learning and state-of-the-art resource optimization techniques to complement the Kubernetes autoscaling functionality.