Vertical Pod Autoscaler (VPA) is a Kubernetes (K8s) resource that helps compute the right size for resource requests associated with application pods (Deployments). This article will explore VPA’s features, provide instructions for using VPA, explain its limitations, and point to an alternative resource optimization approach in conclusion.
Consider the sample Kubernetes resource usage dashboard presented in the screenshot below. We can see CPU and memory resources are over-allocated. In other words, the actual usage of those resources is too low compared to the allocated resources. If you are not as familiar with pod resource allocation, you can learn more about it by reading our article about Kubernetes resource limits and requests.
You may encounter this scenario frequently while managing and administering K8s clusters. As a cluster admin, how can you allocate resources more efficiently?
This problem occurs because the Kubernetes scheduler does not re-evaluate the pod’s resource needs after a pod is scheduled with a given set of requests. As a result, over-allocated resources are not freed or scaled-down. Conversely, if a pod didn’t request sufficient resources, the scheduler won’t increase them to meet the higher demand.
The Vertical Pod Autoscaler or VPA functionality provided in the Kubernetes platform aims to address this problem.
The first reaction to VPA might be, “since legacy stateful applications require expensive refactoring to scale horizontally, let’s instead use a single pod and vertically scale the pod.” Unfortunately, stateful scaling is not yet a supported VPA use case.
For simple Deployments or ReplicaSets (that are not stateful), VPA suggests values of requested pod resources and provides those values as recommendations for future Deployments. Alternatively, we can configure VPA to auto-apply those values to our Deployment objects.
HPA scales the application pods horizontally by running more copies of the same pod (assuming that the hosted application supports horizontal scaling via replication). VPA is still relevant when horizontally scaling as it will compute and recommend the correct values for the requested resources of the pods managed by HPA.
For example, you can use a customized metric such as the number of incoming session requests by end-users to a service load balancer to scale horizontally via HPA. Meanwhile, CPU and memory resource usage metrics can be used separately to adjust each pod’s allocated resources via VPA.
The VPA controller observes the resource usage of an application. Then, using that usage information as a baseline, VPA recommends a lower bound, an upper bound, and target values for resource requests for those application pods.
In simple terms, we can summarize the VPA workflow as:
observe resource usage → recommend resources requests → update resources
Depending on how you configure VPA, it can either:
updateMode = auto).
updateMode = off).
updateMode = initial).
Keep in mind that
updateMode = auto is ok to use in testing or staging environments but not in production. The reason is that the pod restarts when VPA applies the change, which causes a workload disruption.
We should set
updateMode = off in production, feed the recommendations to a capacity monitoring dashboard such as Grafana, and apply the recommendations in the next deployment cycle.
Here is a sample Kubernetes Deployment that uses VPA for resource recommendations.
First, create the Deployment resource using the following YAML manifest shown below. Note that there are no CPU or memory requests. The pods in the Deployment belong to the
VerticalPodAutoscaler (shown in the next paragraph) as they are designated with the kind,
Deployment and name,
Then, create the VPA resources using the following manifest:
Note that the update mode is set to off. This will just get the recommendations, but not auto-apply them. Once the configuration is applied, get the VPA recommendations by using the
kubectl describe vpa nginx-deployment-vpa command.
The recommended resource requests will look like this:
You can set the UpdateMode to auto in the above example to enable auto-update the resource requests (assuming that it’s not being used in a production environment). This will cause the pods to be recreated with new values for resource requests.
As mentioned earlier in this article, you can use both VPA and HPA in a complementary fashion. When used together, VPA calculates and adjusts the right amount of CPU and memory to allocate, while HPA scales the pod horizontally using a custom metric.
Below is an example diagram showing HPA based on Istio metrics. The Istio telemetry service collects stats like HTTP request rate, response status codes, and duration from the Envoy sidecars that can drive the horizontal pod autoscaling. VPA will be driven only by CPU and Memory usage metrics.
The diagram below shows the Kubernetes and Prometheus (monitoring tool) components that come together to enable the use case described above.
While VPA is a helpful tool for recommending and applying resource allocations, it has several limitations. Below are ten important points to keep in mind when working with VPA.
VPA assists in adding or removing CPU and memory resources but its inherent limitations make it too risky to use in a production environment. Optimizing the Kubernetes resources assigned to essential applications requires a holistic optimization of containers, pods, namespaces, and nodes. Machine learning should power such an analysis to avoid human error and scale to support large estates. Another critical factor to consider is the need to consistently apply the same optimization technology to other resources, whether hosted in a public or private cloud. A cross-platform optimization solution reduces the need for organizational training, improves accuracy based on well-established policies, and provides unified reporting. To learn more, read about our holistic approach to optimizing the entire Kubernetes stack.