Amazon EKS Auto Mode Explained

In December 2024, AWS announced general availability for EKS Auto Mode, which simplifies Kubernetes management activities by automatically provisioning and scaling infrastructure, optimizing instance selection, managing core add-ons, and patching operating systems.

We cover what it does and what it doesn’t do, and most importantly how it is complemented by Densify’s Kubex automated Kubernetes optimization.

See what Kubex can do for your Kubernetes environment. Watch the 20 min on-demand webinar.

[Video transcript]

Hi, good morning. Welcome from Toronto, Ontario. Thanks for joining us today for our webinar. my name is Andy Walton. I’m joined by David Chase. How’s it going today, David? Great, thanks. today we got a really interesting topic and one that a number of our customers have been asking us about.

In December 2024, at AWS reInvent, Amazon announced a new service called EKS Auto Mode. And those customers have been saying, Hey, does that thing do what Densify does? And so the purpose of this webinar is really to provide you with the highlights of what it does, and the highlights of what it doesn’t do, and how it’s actually a very complimentary solution with Densify and Kubex. So what exactly is EKS Auto Mode? It’s actually very similar to what it sounds like. It basically simplifies your IT operations and it takes care and manages a number of specific cluster operations and allows you to configure and deploy easily.

What services does it include? So the creation of clusters, roles, networking, it’s all wrapped up in a nice wizard based UI. It’s quite easy to get started. In fact, 10 to 15 minutes. It allows you to create. Pretty much a fully operational cluster. So taking away a lot of just the manual work that you had to do before in a nice wizard driven interface.

Automation of deployment of dependent services like virtual private clouds, load balancers, storage, IAM roles. So things that you had to set up again in the past. And finally, reduce costs with out of the box Novo2Auto scaling. Dave’s going to talk about that, what it uses under the covers to basically scale up when you need it, scale down when you don’t need the services and things.

David, actually, you’ve configured and you’ve worked with the service. What are your thoughts first off, since you actually got it up and running, in order to get ready for this webinar? opinion is that EKS auto mode is kind of what EKS was always supposed to be. We’ll talk about this a little bit more.

It doesn’t introduce a ton of additional functionality, but what it does is it kind of perfects the experience quite a bit. It gives you a lot of flexibility. It’s very simple to stand up a cluster, just like you said, it doesn’t take any of the power that you had before. So, you know, it really just makes things very, very simple, but it also is a building block.

That you can use to to build on top of and create some pretty robust cluster scenarios might as well get into the details. Since I just did the high level piece. absolutely. So we’re just going to go a level deeper. We’re not going to make this a technical deep dive, but just going a little bit deeper into what.

EKS Auto Mode actually does and what it includes. So first of all, compute management. It has auto scaling built in. It automatically adds and removes nodes from your cluster based on workload demand. So that’s something that is built into it from the ground up. Other things that are important about how it manages nodes and does compute management to make sure that you’re selecting from appropriate AMIs that are appropriate for Kubernetes cluster.

It makes sure that those nodes, when they’re provisioned, have a base level of security assigned to them. It also does things like supports GPU, so it’s pretty flexible in how it does compute management. From a patching and currency perspective, it does a couple things. Firstly, it automatically maintains the OS level, as well as the patching level of your nodes, which was something that you often Had to do manually in the past, and also it maintains the currency of any add on.

So there’s a lot of add on EKS add ons that you need to deploy in most EKS clusters. Those are now automatically maintained the latest and greatest version for you. Now, ease of configuration, Andy mentioned that already. It’s really, really easy to stand up Any case automotive cluster again, it can be done in about 10 minutes where there are things that need to be configured like IAM roles and whatnot.

It’s really very simple. A lot of the time. It’s simply a matter of clicking create an appropriate role and then just accepting 1 or 2 defaults and all of the work is done for you. So again, it’s just a much, more polished experience. It also automatically configures things like Interpod networking, load balancing.

Storage. So, ephemeral stories, the drivers are already built in right from the get go. And then the last one is Karpenter node autoscaling. So we mentioned that autoscaling, node autoscaling is built into EKS Auto Mode. You may have heard the term Karpenter already. branding of Karpenter is not at the forefront of EKS Auto Mode.

I think they, tend to want the EKS brand to be at the forefront, but it is behind the scenes there. In the past, it could be a little bit painful to configure Karpenter on an EKS cluster. That complexity is now gone away entirely. It’s very simple, out of the box, already configured for you, ready to roll from the moment you provision your cluster.

In addition, reducing costs. There are some real things in EKS Auto Mode that help you to reduce costs. That’s done, again, through out of the box auto scaling to make sure that if you have nodes that aren’t in use, those get scaled in. You’re not going to be paying for EC2 costs for nodes that are sitting idle.

In addition, right out of the box, it’s got support for spot instances, which may not be something you want to do too much your production environments. You know, a matter of business choice, but it may be something that’s quite attractive for dev and test type environments so you have that capability out of the box.

And then just another comment on cost. It’s got a very simple billing model. So basically. pay a flat cost for the cluster itself, for the control plane. And then on top of that, you then just pay for the EC2 compute in the form of nodes that you’re using. And lastly, just a comment that it is backwards compatible with most of the tools that you’ve already been using.

So it fits into your infrastructure. Without any significant disruption, you can continue using infrastructure as code tools like Terraform, you can continue using tools like EKS CTL like you’ve always been. And another thing that’s very important to understand as well is it’s relatively trivial to take an existing cluster that’s not running in EKS auto mode and upgrade it to auto mode.

So it really does fit into your infrastructure quite smoothly and without a lot of interruption. good summary there, David. I was curious a little bit of EC2 nodes and what sort of infrastructure it’s using under the covers. I guess, whatever Karpenter is using typically spot or, I was just curious about that. Yeah, so that’s completely configurable. It can use spot. It can use on demand instance types, or it can use a selection of both. So really, that’s something that you are completely able to configure. believe the default setting is to enable spot. So it does create a what’s known as a carpenter node pool, and that default node pool does, in fact, support spot instances.

However, you can configure that after the fact. In fact, it’s so extensible that, for example, if you decide you don’t even want to use carpenter, If you prefer older model of using no groups running on ASGs, you can do that as well with EKS Auto Mode. You’re not restricted to only using Karpenter. There’s a ton of flexibility.

So touching on what EKS Auto Mode doesn’t do, because it’s not the be all and end all of everything. It’s a very good product, but there are some things it doesn’t do. Firstly, just wanted to point out that it doesn’t introduce a lot of net new functionality on the EKS side of things.

What it really does is it ties together the existing services that were already available in EKS, makes them much, much easier to deploy, much, much easier to configure, and much easier to maintain on an ongoing basis. So you’re not going to see a ton of changes in functionality. It’s just a much more optimized experience with using the services EKS already supported.

Other things it doesn’t do as well. Karpenter does a great job of making sure that we turn down nodes that aren’t in use so that we don’t have any inefficiencies there, but nothing in EKS Auto Mode will resize your workloads to make sure that they’re fully optimized. So if you’re requesting. In your environment, far more resources than you actually need.

Karpenter is going to continue to provision hardware for those. If you really want to fully optimize your system, you have to make sure that your workloads themselves are right sized. That’s something that EKS Auto Mode doesn’t do out of the box. And also, again, Karpenter uses the node types and instance types that you specify.

By default, it uses a very broad list of Instance types. And those are not necessarily optimized. So to really optimize your nodes and make sure you’re using the right instance types and right families. That’s something that does require a little bit of additional work as well as additional configuration.

Let’s have a quick look now at the overall EKS architecture, I mentioned that there’s not a huge difference in the functionality between EKS and EKS Auto Mode. It’s just a much more simple solution to maintain. This is what traditional EKS looks like where at the bottom, you see kind of your infrastructure services and the foundational things required to run a Kubernetes cluster.

On top of that, you’ve got your EC2 compute, your nodes that runs all of your workloads. And then at the top level, you’ve got your actual applications that are running on top of it. In traditional EKS, typically, that would be something that was maintained by you. So you were responsible for making sure that most of that infrastructure was configured up and running.

Now in the new EKS Auto Mode architecture, the architecture itself hasn’t changed. So you see, there’s no major changes to this diagram, however, the responsibility and what portion you have to. Maintain and manage yourself is what’s changed, so you continue to manage your workloads, which is something that’s vitally important.

Your applications, making sure that they are healthy, operational, they’re properly sized, but then everything from the nodes on down are now managed by EKS Auto Mode, something that for the most part, you don’t have to be worried about because it’s managed already again by the managed service. Now that having been said, if you recall a couple slides back, we mentioned that there are still some things that EKS Auto Mode doesn’t do.

So this picture isn’t 100% complete, and that is when we’re talking about things like optimization of workloads, optimization of the nodes that you’re running. So, Andy, I’m going to hand it back to you to cover this next slide, where we talk a bit more about resource optimization.

Yeah, that’s great. Thanks, David. So when we talk about resource optimization, we look on the left and say, there’s two major problems that these different groups are trying to solve, making sure that there’s enough capacity to run the workloads, but not too much such that, , we’re wasting money or way over configured, like the traditional way to do things in the cloud or even on prem.

From an enough capacity perspective, it’s the app teams and this reliability engineering groups that care about that. And controls they have over those things are the container requests, container limits, and of course, the other components of Kubernetes. At the FinOps side of the house, making sure that there’s not too much infrastructure, the cost has occurred here in terms of the nodes, the scale groups that you’ve configured, the VMs, the bare metal you know, the infrastructure that’s running those workloads.

So we have to look at. You know, that area in terms of controlling costs, but it’s really the container resource level that defines how much infrastructure is needed in terms of CPU memory requests and limits, etc. it doesn’t directly incur cost. But it indirectly definitely affects, what’s running on the node level.

So there’s really this interdependency, really these two things are intertwined and we can’t solve the cost and the risk problem separately by fixing the nodes and saying, okay, we’re going to scale those and scale them up or scale them down.

We really do have to look at those container resources. But setting those things really affects how we’re going to set up the node infrastructure. Long story short, a full stack approach is really required here, where we look at the container resources, and we’re going to show this to you in a demonstration in just a moment here.

We have the ability to do that deep container analysis looking at, are those settings correct? Those things that will affect the infrastructure, the CPU memory requests and limits. Have we specified resources in some cases? Because we’re That can cause an awful lot of problems as well. If they’re not, David’s going to get into that. And then by doing so, we then get into the deep node and infrastructure analysis and determine, are we running on the right type of infrastructure? , are we running a compute optimized, but it really should be on memory optimized or vice versa. The analytics and the recommendations will tell us if that’s the case. So let’s go back to you, David, and let’s put that other diagram up and just say, how does that get affected?

Yeah, absolutely. We’re back to our architecture diagram of EKS Automode standalone, but really, if you’re combining EKS Auto Mode and Kubex. You really have a better together story here, so we want to just go into a little bit more detail about what exactly changes once you add Kubex to the equation.

So the first thing that Kubex does, it helps you optimally size those workloads, making sure you understand what the appropriate requests and limits are for your different workloads, making sure that you don’t have containers that are undersized. So undersized containers introduce risk into your environment.

You might have out of memory kills (OOM kills), you might have pods that are being terminated. By the same token, you don’t want containers that are oversized because that represents waste and potentially additional costs. And furthermore, you also don’t want containers that have no requests and limits specified at all, because that’s just as bad as requesting zero resources.

You want to make sure that those scenarios are fully optimized. You’re accurately setting requests and limits for all of your workloads. Kubex provides that information for you. And by doing so in most cases, it’s actually going to overall shrink the size of those workloads. You’re not going to be paying for compute resources that you don’t need for CPU and memory that aren’t being utilized.

So you can shrink down those workloads. And by doing so, you’re actually going to be able to reduce the number of nodes that you’re actually running on. So right there, you’re going to be able to achieve additional savings because you’re not going to be running as many nodes. But also, as we mentioned, Kubex can also help you to inform Karpenter of the best node types and node families to run within your cluster. So again, instead of just having a whole slew of instance types sitting in your Karpenter node pool, you can have a much more tailored list of instance types that are better suited to your individual workloads.

And by doing so, you can either potentially shrink the size of the nodes that you’re using, or you just make them more efficient. Instead of paying for a bunch of compute optimized instances where you’re paying for CPU that never gets used, maybe you want to move over to a memory optimized instance type, where you’re not paying for that additional CPU.

That’s at a high level looking at the architecture, but we wanted to go a step deeper and at least show you what this looks like in reality. We’re not going to go into exquisite detail, but I just want to show you kind of what a before and after view of an EKS Auto Mode cluster looks like when it’s combined with.

[DEMO]

Very quickly, I’ll just show you what we’re looking at here. Looking at the left hand side here, we’ve got two different clusters that we’ve set up in Kubex.

We’ve got this first one, this is kind of the before scenario. So this is an EKS Auto Mode cluster. It’s got spot instances enabled. It’s got Karpenter autoscaling enabled. But we haven’t done any optimization of the workloads or of the nodes. and then we this second one, which I’ll show you just in a moment.

This second one is. fully optimized. We’ve run Kubex against it. We’ve implemented the recommendations, and that’s where we have a fully optimized EKS Auto Mode cluster. But first, let’s just look at the before picture. What are we looking at here? I’m not going to explain this screen in exquisite detail.

We’d love to chat with you about it another time. So please feel free to drop me a note. I’m happy to go through what we’re seeing here in these Histograms in great detail, just at a high level. I’ll explain that what we’re looking at here is we’ve got four different histograms, one for each of the kind of dials that you can set for Kubernetes cluster to adjust and right size your workloads.

We’ve got CPU requests, CPU limits, memory requests. And I’ll point out, we’re not going to go into great detail on CPU limits, because a lot of customers don’t set those. We’re not actually setting those in our cluster either. But if we look at, for example, CPU requests or memory requests, let’s just interpret what we’re seeing here.

If you look at the green bar, this is the containers within your cluster that are just right. We’ve already been right sized. They don’t require any actions whatsoever. The higher this bar is the healthier your environment is. You also see that we have yellow bars here. Those represent the containers in our cluster that are over provisioned.

So potentially we have waste here. We have additional costs that we could reduce. And then red represents This is where we have containers that are undersized, and that’s where we’re at risk of running into memory issues, containers being restarted, and whatnot. And the third bar that we want to be aware of is just this gray one. That’s where we’re not specifying this resource type. We’ve left it unspecified, and again, that’s a bad thing, because it’s the same as requesting zero resources.

So Green bar high is good. Yellow and red bars, again, yellow being waste, red being risk. You want those to be as slow as possible.

This is what are just out of the box cluster looks like we’re running a number of typical customer workloads. It’s allowed environment, but we are running something that’s fairly similar to a customer would be running in production. Now, let’s look at the after, after we’ve optimized this with Kubex.

What does this cluster look like now? Well, all of a sudden we can instantly see that the number of just right containers rises significantly. These are already fully optimized. You still have a handful, one or two containers here or there that are not fully optimized. That’s normal.

The way workloads are used and the usage patterns are going to change over time. Sometimes you’ll have a bit of a spike in demand. This is a test cluster, so you are always going to see a handful of containers that still need some optimization, but we’ve seen a huge increase in the number of just right containers.

We’ve also seen a huge drop in the number of containers with unspecified resources. Just that before and after is really quite striking. The other thing to notice is we’re seeing a lot less risks within the environment. And just to get down to a really practical example of how optimizing your workloads using Kubex can have significant impact on your cluster.

Let’s look at this before again, and I’m just really quickly going to deep dive into the individual containers and the health of those. And I’m going to click here on details. I’m going to hide the performance charts, and I’m just going to look at one particular field, and that is number of times that this container has had an unplanned restart in the past 24 hours.

What you see here is in the before picture, so just an EKS Auto Mode cluster with our workloads deployed on it with no optimization. Of requests and limits. We’re seeing 500 restarts a day. And these are actually an important containers. These aren’t just observability tools. These are our actual workloads.

Once we switch over. And we fully optimize these and we’ve eliminated those undersized workloads where we haven’t specified enough resources. We can quickly see that this number goes from hundreds to zero restarts.

Now, there are a number of different scenarios that can cause a restart of a container. So this is specifically resolving the issues where a container restarted. Because you weren’t allocating enough resources. But the point being you can have a significant impact on the health of your cluster just by making sure that you’ve optimally sized your workloads. that was what I wanted to show in the demo.

There was one question, David. I’ll just get into it right now while we’re still talking to it. And I answered it online:

How does Densify deepen the native security features provided by EKS Auto Mode?

And my, my answer is essentially, look, we’re going to do the optimization at the container and the pod and the node level, but we don’t touch security unless there’s something about. Recommendation that would enhance the way they’re doing things. Your take on that?

Yeah, I would say the same thing. Our focus is on doing resource optimization and doing it extremely well. So, that’s where we’re going to add that primary value for you. And the good thing is we are designed to operate in the existing ecosystem. There’s a lot of solutions out there that will help you secure your Kubernetes clusters. We are fully compatible with those, but that’s not a functionality that we feel at this point. In terms of what we offer, and you’ve seen now the demo and few slides in terms of our positioning Kubex is our latest release, brand new interface, full stack, AI driven resource optimization for Kubernetes, wherever that may be running, whether it be an AWS, Azure, GKE, or potentially up in OpenShift on prem.

We cover all those areas that we talked about today, containers, nodes, cloud instances themselves. And it’s not just about cutting costs. It’s also about reducing risk and improving performance. You can see for yourself go to our main webpage, densify. com. Click on the Trial, use us free for 60 days. And no risk, no credit card or anything like that. You’d be up and running very quickly.

And if you don’t feel like actually configuring this for your own environment to get started, we can set you up with a sample demo environment and you can click around, do a lot of the things that David was doing today to experience, Kubex, without having to set it up and go through all the process of doing so. But certainly there’s a ton of value in having this pointed at your environment. Once you’re comfortable, we’d encourage you to do so. Very low risk in doing so.

We’re just approaching the end of 11:30 timeframe here. Everyone, thank you for attending today. We look forward to seeing you at the next webinar next month, all the best. Take care. Thanks.

Let’s get started on something great.