A 2024 survey by the Cloud Native Computing Foundation (CNCF) revealed that Kubernetes is a significant portion of public cloud expenditure, accounting for up to 50% of cloud costs for 28% of responding organizations.

Another 2024 report found that 37% of organizations have 50% or more workloads in need of container rightsizing to improve cost efficiency. The combination of heavy investment and significant resource over-provisioning leads to wasted resources, increased costs, and hidden risks that are impossible to address at scale. 

Watch this 20-minute webinar to learn about the top 5 strategies to optimize your Kubernetes environments. We will cover key tactics around identifying the most significant areas of impact, fixing high risk workloads first, and automation strategies. 

Top 5 Strategies for Kubernetes Optimization

  1. Use autoscaling to provide elasticity
  2. Set requests and limits on all workloads
  3. Choose node types and families to match your workloads
  4. Use automation to optimize at scale
  5. (Advanced) For environments where cost is a critical factor, consider enabling bin packing

[Video transcript]

 Good morning, good afternoon, wherever you may be. Thank you very much for joining us today. We’re very pleased to run through a topic that’s top of mind for many folks these days, many organizations, and that’s five strategies for Kubernetes optimization.  My name is Andy Walton; I’m joined by David Chase. We’ve got some great content here that we’re going to go through and actually short snipbits into our new product Kubex, a new interface we’re going to talk a little bit more about that at the end, we’re very excited to show that to you for the first time ever.

Let’s focus on the strategies for Kubernetes Optimization. David’s driving today and he’s going to go through why Kubernetes optimization,  as many folks know, is a pretty popular platform right now.

It’s the fastest growing open source project in history outside of Linux.  It’s estimated by 2027, more than 90 percent of global organizations will be running applications Kubernetes based in production. The second point is explosive growth. We often see many of the organizations we talk to don’t really care too much about setting things up. You know, it’s often accompanied by inefficiencies of how they’re operating. David, maybe you’ve got a few insights on that.

Yeah, , I was just going to say that, it makes perfect sense that while you’re experiencing explosive growth, you want to remain competitive. You  want to get things in place very quickly that yes, some things  will fall by the wayside from an efficiency perspective. And there’s sort of this mentality that tends to pervade and that is “we’ll fix it later”. And that’s perfectly valid that if you want to be nimble, that there are some things that you know, may not be perfect in the first iteration, and then you fix that in later iterations. But it does mean  that potentially there’s  some room for optimization over time.

And then finally, following that explosive growth is organizations with more and more workloads into Kubernetes and now get into a little bit more of a steady state. There does seem to be some focus on consolidating workloads, more scrutiny around how much money is being spent. So that seems to be, I guess, the focus for many of the organizations that we’re now talking to is that a lot of them already have a whole bunch of workloads running and now they are interested in, okay maybe we want to get rid of some of that waste but when we talk about optimization, David we’re talking about more than just simply looking at saving money, right? 

Yeah, absolutely.  The bullets that we’ve got up here tend to focus on cost savings and whatnot and inefficiencies. Well, you got to keep in mind that Kubernetes kind of changes the model where we’re moving back to a model where often we’ve got shared infrastructures and therefore optimization is not only about saving money. It’s also about making sure that you’ve minimized risk within your environment that your workloads are healthy and that you’re not potentially having impact on other workloads as well that are running side by side.

So when we talk about optimization, what we’re going to talk about here is very much not just about cost optimization, but also making sure that we’re eliminating risk within our environment as well.

That’s great. Well, we’ve got a few examples of that today as well. So let’s get into the actual five strategies. I guess I should ask you, David, are these in any particular order? It seems to be some of this is low hanging fruit. It gets increasingly a little more you know, I won’t say complicated, but these are things that you can start off with at least one, two, and three fairly quickly. Is that right?

Yeah, exactly. So as we go through these, they’re essentially in the chronological order that you would typically want to run through them in your environment because some of these build on others. For some of these strategies, you’re going to want to make sure you do the foundational work first, that then allows you to be in a healthier state that allows you to move on to the further optimizations. And then kind of at the end of our list, we have one that’s kind of a bonus that is a much more advanced strategy. 

Okay, well, let’s dig into the first one here. And the first strategy is to leverage auto scaling technology to provide elasticity. And funny enough,  one of the last webinars you and I did few weeks ago, talked about the fact that autoscaling is not optimization, not by itself. I guess it is a strategy to address some optimization. Maybe you want to dig in a little here.

Yeah, absolutely.  You’re right that autoscaling is part of a big picture for optimizing your environment, but it’s not optimization alone. And sometimes there’s some confusion there that it’s the same thing. So when you think about autoscaling and why it’s important, it’s all about providing the elasticity within your environment.  One of the foundations behind a cloud native architecture is that A, you want to enable auto scaling, but you want to make sure that you’re sizing your workloads appropriately for demand, that your capacity matches your demand at all times.

So if you, for example, have a spike in traffic to your website or to your application that you then are able to scale out your capacity to meet that demand. But at the same time, the other part that’s very important is once that demand drops off, you want to be able to also utilize auto scaling to scale back in the resources that you’ve allocated so that you’re not paying for hardware, paying for capacity that’s not actually being utilized. So there’s a lot of technologies behind autoscaling that are in use today quite heavily they’re probably things that everyone here will be familiar with or at least have heard of, but you’ve got things like horizontal pod autoscaler (HPA). This is what adds pods to your environment based on demand. Basically it allows parallelism and that’s how you add capacity. There’s also something called vertical pod autoscaler (VPA). This allows you to scale up an individual pod by adding additional CPU, adding additional RAM to it. And then you’ve got things like Cluster Autoscaler.

Now the first two worked at the pod level, so adding capacity at the pod level. But as you add more pods, you potentially need more compute capacity. That’s where something like Cluster Autoscaler comes in. Adding and removing nodes from your cluster based on demand. So again, that’s a very important part of making sure that you have the right amount of capacity.

And then the one that’s not depicted in this graphic, it’s the same, the very hot topic these days is Karpenter. Karpenter is a direct plug in replacement for Cluster Autoscaler. It just has a much greater feature set. It’s more responsive, faster, lots more flexibility. So having These technologies, some mix of these in place in your cluster is absolutely integrals, making sure that you’ve optimized your capacity for demand, but that you’re also not paying for capacity you don’t need.

That’s a good overview of auto scaling technologies. Thanks. Let’s go to the second strategy. So this is one we see a lot where companies don’t set the requests and limits on all their workloads. Here is a histogram. We’re actually going to use the product in a moment, Kubex, to dive in and have a closer look at what this looks like in practice.

But you’ll notice for the CPU and memory requests and the memory limits, we’re highlighting workloads where those limits are not set, and that’s that can be a big problem if companies just say, “I’m just gonna let Kubernetes handle that for me”. Isn’t that right, David?

Absolutely. So the way Kubernetes works is it manages resources based on the request limits you’ve set for your pods and for your containers. So I would argue that it is essentially impossible to have a healthy Kubernetes cluster unless you’re setting requests and limits on all of your workloads and also dive into a bit more when we look at the UI is also making sure that those are as accurate as possible because  even if you’ve got requests and limits set, if they’re not accurate, then that could potentially result in some issues in your environment.

Seeing is believing – let’s have a look at what this looks like in practice. What we’re showing here is a sample environment. This is just one of our lab environments, but it’s showing that for several, containers in our environment we’re not setting requests and limits. That is something that is not ideal. And the reason is because, again, the Kubernetes scheduler manages resources based on what your requests and limits are. If you have a situation where one of your nodes is running out of resources and you haven’t set any requests, on this container, then what Kubernetes is going to do is it’s going to look at what’s the amount of memory, for example, that you’ve requested. How much are you using and the containers and pods that have the biggest gap between what you’ve requested and what you’re actually using there. The most likely ones that will actually be killed to free up capacity on a node, assuming that node is already at capacity. So again, you have this risk of an unhealthy environment.

If you’re not setting requests and limits, and if those containers that are being killed to make room are ones that are critical, mission critical workloads, then you potentially have a significant issue in your environment. The other thing that I mentioned as well is it’s not just about setting requests and limits. That’s extremely important and we believe it should be set essentially on your entire environment, but also that you’re setting these accurately.

So for example, if we click here and drill down into the details some more,  what we can see is  these are all the containers within our environment that have a huge gap between the amount of memory that they requested and the amount of memory that they’re actually using you’d want to mitigate these by setting these requests appropriately, because any of these containers are potentially at risk of being terminated if we run out of resources on a node. 

Similarly, by the same token, let me just clear this filter here. And I’m just going to sort in the other direction. And now we can actually see waste that’s taking place within our environment where we’ve set our requests for resources far too high. So you’re setting aside hardware that you’re not actually using. So again, these are containers that you’d want to right size to make sure that there’s no waste in your environment.

And then the other thing as well, we had mentioned that you want to make sure you don’t have unset requests and limits. So, by the same token, we can just sort here, and we can see here are all the containers that have unset requests. You would want to set these for all of your containers, you don’t want to have requests  unspecified, with the potential exception of pods that are in the kube-system. But other than that, you want to be sending requests on all of your work. The thing here is that surplus memory request is actually the setting that you could start with initially.

So, for folks that aren’t setting these things, we’re giving them an indication as to what that initial setting might be.

The other thing that is important, and we’ve had this as a conversation before, in terms of the order of operations setting these things correctly and a lot of times what you want to do is address the risk first and then go after the waste and risk for us would also be.

Having a closer look at these non specified settings, memory requests, CPU requests I think that the limits are different memory limits you want to set and it looks like a lot of folks don’t even bother with the CPU limits. Is that right?

Yeah, absolutely.  It’s a matter of, you know, different operators will have different opinions on this. Some customers like to set CPU limits. Others believe that there’s not a lot of value in setting CPU limits. Personally, I’m sort of in second camp. But regardless, we’re able to make recommendations for our customers on what those requirements are. Request a limit should be  you implement CPU limits or not. Great. Strategy number two is set your requests and limits for all the workloads.

Let’s move on to number three. So this is an interesting one in terms of where the actual money gets spent is at the node type level. The nodes that you are running on whether those be running on scale groups or on just plain, IaaS instances inside the cloud. Choose the node type and the family to match your workload. What do we mean by that?

Great question. If you think about it, if you look at your environment and for example, when we’re talking about memory utilization being the primary constraint, that means there might be some CPU going underutilized if you dig in deeper and you look at those node groups and you see that request, or average utilization for CPU for those workloads is below 50%. It actually means that you could change instance types and save, costs by switching instance types. So if you think about, let’s say you’re running a general purpose instance, and you notice that memory is failing, close to being fully utilized, but CPU utilization is averaging below 50%.

You can actually very quickly just change to a memory optimized instance type. So you keep the exact same amount of memory, but you cut your CPU in half and that’s going to have immediate cost savings for you. So that’s just another very simple strategy. Make sure that you’re looking at your workloads, understand how they’re utilizing resources like CPU and memory, and make sure that you’re choosing node types that are appropriate for those workloads, because that can be a very quick win for you.

So it works in reverse as well. f your CPU is your primary constraint, you might want to look into it and identify whether your memory is fully utilized. And if it’s heavily underutilized, you might want to go, for example, with a CPU optimized instance as well. Same sort of savings, same scenario, just using the other resource. Right.  And what’s interesting is you’re seeing screenshots, the last example number two, as well as this is also an example of Kubex our new interface.. going to talk a little bit more about that at the end, but you’re seeing a sneak peek of Kubex in this webinar today, which is fairly exciting for us. 

Let’s move to number four here. This is a bit of an eye chart,  let’s talk a little bit about The fourth strategy, which is to use automation to optimize at scale  and another screenshot of Kubex here of a fairly large environment.

Look in the middle.  We’ve got as an example, you look at the number of pods, 115, 000 number of containers, almost 200, 000 there. And I think that’s the point of this one is optimizing at scale. In this example, we’ve got a relatively large environment, but when you think about it, as we talk about the increasing popularity of Kubernetes and how customers are increasingly moving their workloads to these kinds of examples will become less and less an example of a large environment just become less and less.

Average typical for large organizations where you’re going to see again, numbers of containers measured in the hundreds of thousands.  That is becoming far more commonplace, as we look at our customer base, so keep in mind that you couldn’t possibly hope to manage this many resources and optimize this many resources.

Using manual processes and Excel spreadsheets and whatnot, you have to have something you have to leverage up automation in order to have any kind of dent in something of this scale. Now, you want to have a plan strategy for where you implement automation where you don’t. A lot of our customers express interest in automating environments like dev or test.  Those are good examples where you might consider this fairly low risk. It’s very easy to implement automated recommendations and optimizations there. But there are a lot of customers as well who identify portions of production environments. It can be fully automated. So, again, this is just about the fact that if you really want to be successful, you’re just not going to be able to do that using the old model of using spreadsheets. You have to have an automated solution and some of the automation we’re talking about there right now. It’s just leveraging the ecosystem that exists. Those things like infrastructure is code types of technologies thing. 

Absolutely. So, you know, lot of customers are using infrastructure as code (IaC) tools. You have the ability to do that.  If you have a product such as Densify generating these recommendations. You can then very easily integrate these with things like terraform or CI/CD pipelines where you pull those values out of an API.  And then you apply them at deployment time and then another option as well, and something the Densify is going to be implementing fairly soon, is just doing things right at the point of instantiation using a mutating admission controller, which basically just says, when I go to initiate a resource, I’m going to look at what the recommended requests and limits are, and then I’m going to implement those for those workloads that we find as automation.

That’s great. Yeah, we have some exciting news coming  in that area specific to automation. So obviously very important to consider automation when you’ve got tens or hundreds of thousands of workloads, some of our customers with more than you know,  factor of 10 onto this, into the millions of containers. So can’t be done by hand.

And let’s talk about strategy five, which is more of an enhanced type of topic for environments where cost is really important. Considering bin packing in those environments. And I know you’re even thinking about doing its own separate webinar on this topic one day.

Yeah, stay tuned for a follow up webinar on this one in early 2025. The point behind this one is – and I’m not suggesting that this is a strategy that every customer should use. But for certain environments and certain customers where cost is absolutely critical, bin packing is something that could be a very good fit.

Now, the term bin packing can mean a lot of different things for a lot of different people. I want to be clear here that for this scenario, we’re talking about something very specific. And that is the algorithm that the Kubernetes scheduler uses to choose which nodes to deploy workloads on.

So let me quickly show this scenario where you have a number of nodes. And you have to apply workloads to those nodes. The default allocation strategy that Kubernetes uses is something called least allocated. And what that means is when you go to schedule a workload in Kubernetes, it looks at your available nodes, it identifies which one has the least traffic  the least resources in use, and it will then assign that pod to that node.  And what that means is it’s a great strategy from being low risk. It means  you’re far less likely to have resources that are overutilized because the scheduler will always put new pods on nodes that have lots of available capacity. 

But if you think about when we talk about, autoscaling. If you have an event where you have a lot of demand and you scale out to a bunch of additional nodes, let’s say you add an additional six nodes to your environment. Using this least allocated strategy,  once demand drops, it becomes very difficult for your cluster to scale those nodes back in because as soon as one of those nodes gets down to low utilization, Kubernetes is going to go, “Oh. I’m going to start putting workloads on you because you are the least busy out of the bunch”. So, bin packing in this strict scenario is when you use a different algorithm called most allocator. And what that does is when the Kubernetes scheduler goes to place a workload, it looks at the environment and it places the workload on the node that has the most, that is the most busy, but still has capacity to host the workload.

So on the left, we see without bin packing, this is the default algorithm. You can see a fair number of nodes and they’re going to be only partially utilized. You’re not going to see a large number of them at high utilization. But if you’re placing your workloads onto nodes that are the most busy, but that still have capacity. You’re going to see a lot fewer nodes being allocated and each one of your nodes is going to be at higher utilization.  It’s not necessarily the best strategy for all environments but where cost is important, the bin packing strategy allows you  to utilize fewer nodes. And each one of those nodes is going to be at much higher utilization, which from an optimization and cost savings perspective is far more desirable. It’s an option for where it makes the most sense. 

Well, we’re at the end. Let’s just summarize the  five points that we reviewed today. So, we started with using auto scaling to provide elasticity, different technologies, HPA, VPA. Karpenter cluster autoscaling setting the requests and limits on all workloads making sure you’ve got those CPU and memory requests specifically memory as accurately as possible. Yes, not just putting a number in there so that, the schedulers better informed choosing no types and families that better match the workloads using CPU optimized or memory optimized or another Probably you’re using one of those two or general purpose to best match what you’re using  your primary constraints in your environments. Leveraging automation to optimize at scale as these environments get larger and larger. And then finally, the one we just covered, which is enabling bin packing selectively to really drive up cost savings. 

We’ve been showing you snapshots and a small live demonstration sneak peek into Kubex. This is our new view into the world of Kubernetes optimization is full-stack AI-driven resource optimization. We optimize containers, the nodes and the cloud instances and address the areas of optimization, which for us includes cutting costs and improving performance, and it works across any Kubernetes environment that you’ve got as well as OpenShift, wherever that may be running.

We are actually going to run  a webinar in January, so that will be run by Andrew Hillier, our CTO is going to lead that discussion in January and give you a full blown demonstration into what Kubex looks like. If  you’d like to try it for yourself,  go to densify.com/product/trial or even  our new dedicated site for QBEX, qbex. ai allow you to get more information on kubex.ai/trial as well as try it free for 60 days with no credit card required, which is always nice around Christmas time. And then if you’d like a little more background in terms of Densify, we’ve been named a market leader by GigaOm in this space of Kubernetes Resource Management. So have a look at that report and see how we compare to other technologies that are out there. We compare very well just to give it away.

We’re at the bottom of the hour. I did have one question around Kubex. And does it look different from the standard Densify interface? Yes it is much different. And it is available now, if you’d like to try it out.

Follow us on social media: LinkedIn, X, and YouTube.

We’re going to wrap up like to thank everyone for attending today. And go to densify.com for more information.

Have a great holiday season. Thanks, David for all your information today. It was great.

My pleasure. Thank you.