As organizations move their workloads from on-premise systems to cloud providers with the promise of cheaper and easier access to computing resources, they must implement new architectures and resource allocation strategies that don’t apply to fixed-capacity on-premise systems in order to achieve these benefits. Cloud resource optimization helps address this problem by efficiently allocating cloud provider resources to fulfill business requirements while optimizing cost, scalability, and performance.
Cloud environments include features like elastic compute capacity, which need to be leveraged correctly with expertise and tooling to ensure migrations to the cloud don’t lead to cost blowouts; and to maximize application performance. Additionally, modern cloud architectures, like containerized workloads on managed Kubernetes platforms, make resource optimization even more complex because you cannot rightsize the nodes they are running on until you first rightsize the container pods running on them. This requires additional analysis and optimization to ensure resources are allocated efficiently.
Fundamentally, cloud & container resource optimization consists of four stages that organizations should continuously iterate through.
An overview of the cloud resource optimization process.
This article will explore these stages and cloud resource optimization in detail, including challenges, best practices, and a practical example of how to optimize Kubernetes resources.
The table below summarizes the cloud resource concepts we will explore in this article.
Concept | Description |
---|---|
Analyzing workloads for cloud & container resource optimization | Analysis is required to gain insight into container and instance resource optimization opportunities. Intelligent analysis will require taking into account multiple factors related to workload and business requirements, such as what custom pricing agreements an organization has negotiated with a cloud provider, seasonal utilization patterns, peak and average sustained load statistics, enabling a resource buffer for production environments, and over what time period to gather data for calculating rightsizing decisions. An important point to note is that container resources must be optimized first before optimizing the instances. |
Selecting from a broad range of compute instance types | Cloud providers typically offer hundreds of compute instances types, and accurate selections must take into account many hardware attributes (like CPU, memory, disk, network IO, and GPUs) as well as workload utilization patterns. |
Planning purchases with discount programs | Cloud providers offer discounts, including spot and reserved instances, to help control compute costs. Leveraging these pricing models will help organizations maintain cost efficiency. |
Understanding the horizontal and vertical scaling options | Resource allocation can be scaled horizontally (more compute instances) or vertically (larger compute instances). Balancing elements like high availability, operational overhead, and application requirements will help determine the appropriate scaling strategies. |
Implementing the resource configuration changes with automation and continuous optimization | Continuous optimization of resource allocation when the compute instance or the containers running in these instances will require the integration of automation tools. This will allow workloads to be automatically optimized based on recommendations from machine learning algorithms offered by third-party providers. |
Featured practical example: Kubernetes resource optimization | Kubernetes platforms require the rightsizing of Pod Request and Limit values to allocate resources to containers, which are then scheduled to run on the selected cloud compute instances. Setting these values accurately is critical to ensuring proper scheduling decisions and resource boundaries are managed by Kubernetes. Automation tools can be integrated to enable the containerization use case. Only then can the nodes they are running on be optimized. |
Resource optimization begins by gathering workload and business requirements, understanding cloud provider instance types, and determining actionable insights. Let’s look at each of the critical components of this stage.
Optimizing resource utilization and making rightsizing decisions will begin with gathering business and workload requirements.
Business requirements will include things like:
Workload requirements are related to leveraging historical utilization patterns to provide insight into appropriate rightsizing decisions. Performing statistical analysis on historical metric data can provide insight into appropriate rightsizing strategies, allowing users to leverage this data to optimize resource usage moving forward.
A challenge here is in analyzing all available metric data to accurately determine rightsizing recommendations. Large metric data sets can be generated by resources like compute instances and Kubernetes Pods, and the volume of data will be difficult for an engineer to accurately apply statistical analysis and determine rightsizing choices manually (or without the right tools).
Additional complexity is also involved with correlating data from multiple sources like AWS CloudWatch and Prometheus, to ensure rightsizing is being performed on both cloud resources and containers simultaneously. New metric data is also being continuously generated by cloud and container resources, so analysis will need to be performed on a regular basis to ensure rightsizing decisions do not become stale.
Organizations can benefit from Densify’s Software-as-a-Service (Saas) resource optimization tool to collect and aggregate metric data from multiple sources automatically, and provide accurate resource optimization recommendations quickly and accurately for both cloud resources and containers. Leveraging Densify to help analyze and generate recommendations will reduce the operational overhead for engineers, while also ensuring recommendations are accurate and up-to-date.
CPU and memory are important, but selecting instance types without considering other resources is a mistake. Other attributes like network IO, GPU availability, and locally-attached SSDs should also be evaluated. It’s worth noting that instance types are only one element of the resource allocation story.
The next step is aggregating data into actionable insights. We need to understand the relevant patterns visible in the time-series data. Real-world workloads experience utilization based on seasonality over hours, days, and weeks. For example, business-oriented applications likely experience lower activity over weekends and holidays and high activity during business hours on weekdays. Analyzing usage patterns in this common scenario would require a daily workload profile based on data from previous weeks. Analysis should break down daily metrics into hourly buckets, and statistical analysis of min, max, average, and sustained loads should be performed. Using this approach, administrators can allocate sufficient computing resources to accommodate a workload that varies over time. Densify is a resource optimization solution that has been specifically designed to provide this type of high-precision resource analysis by leveraging machine learning algorithms.
Organizations must be aware of every billable cloud resource, including compute capacity, data transfer across cloud regions, IP address allocations, storage, and premium versions of hosted services. For example, IPv4 addresses will have a charge, but in-use AWS Elastic IPs may be free.
Data transfer is also a complicated topic; cloud providers will charge varying amounts based on data transfer between services in different regions, to the internet, between on-premise and the cloud provider, and between Availability Zones (AZ). The complexity makes it easy to overlook the costs of running high-traffic environments. Carefully reviewing the cost structure will help organizations estimate future bills correctly, avoid unexpected charges, and plan capacity appropriately.
Meet us at KubeCon + CloudNativeCon North America. Salt Lake City, Utah | November 13-15, 2024
Schedule a MeetingCloud providers typically supply hundreds of instance types for organizations to deploy compute capacity in various sizes and configurations. Cloud instance types include flexible configurations for CPU capabilities (speed, core count, and architecture), memory, disk capacity and speed, network bandwidth and latency, GPU cards, and local or networked storage. The diversity of available computing options enables organizations to select the optimal configuration that matches their workload’s use case and requirements.
Selecting an instance type requires understanding the specific needs of your workload. Different applications have varying CPU, memory, storage, and network performance demands. For example, AWS Compute Optimized instances are ideal for CPU-intensive applications like batch processing. In contrast, Memory-Optimized instances are better suited for applications like big data processing that store large amounts of in-memory data for time-sensitive calculations. The general-purpose instance families are often a good choice for applications that need a balance of computing power and memory.
Users should consider both the baseline performance and the ability to scale. Some instances offer burstable performance, which can benefit workloads with intermittent usage spikes. However, non-burstable instances are typically recommended for production workloads requiring consistent performance. Additionally, choosing between instances using Intel, AMD, or AWS’s own Graviton processors can affect performance and cost considerations.
Network bandwidth and storage performance can be bottlenecks for specific applications. Instances like those in the P-series or I-series offer high-speed networking and are optimized for GPU-based tasks or I/O-intensive operations. One might consider instances equipped with high I/O and throughput like D-series for high storage needs.
The following table provides examples of AWS instance types and their recommended use cases:
Instance family | Use case | Features |
---|---|---|
T2, T3, M4, M5 | General purpose provides a balance between hardware resources. | Evenly balanced CPU, memory, storage, and network capabilities. |
C4, C5, C6a, C7a | Compute-optimized, good for CPU-intensive workloads | Include the latest processors, high core counts, and high CPU clock speeds. |
R4, R5, R6a, z1d | Memory-optimized, good for workloads requiring large memory storage. | Offers a high memory: CPU ratio, DDR5 memory type, and a high memory limit (several terabytes for some instance types). |
I4g, I4i, I3en | Storage optimized, good for IO-sensitive workloads requiring high disk throughput and IOPS. | These instance types come with NVME storage physically attached directly to the compute instance. |
P2, P3, P4, P5 | Accelerated computing is useful for AI, bitcoin, and machine learning workloads that leverage GPUs for faster processing. | These instance types include GPUs like the Nvidia Tensor Core. |
There are hundreds of instance types to choose from, and a careful evaluation of the workload requirements will help determine which instance types are appropriate for the use case.
There are two fundamental scaling strategies to ensure that compute resources align with workload demands: horizontal and vertical.
Horizontal versus vertical scaling. (Source)
Horizontal scaling offers the key benefit of improving high availability by increasing the number of compute instances running the workload replicas. Increased scalability and enhanced fault tolerance are improved due to increasing the replicas, allowing additional hardware/software failures to occur without impacting application availability. However, to leverage this type of scaling, applications must be architected to support this approach. Horizontally scaling architectures typically require applications to be stateless to avoid losing data on scale-ins, have fast startup times to avoid bottlenecking scale-out operations, and have load balancing capabilities to share load among each replica. Microservices are an example of architecture typically built to scale horizontally.
Cloud environments are uniquely capable of horizontal scaling compared to on-premise environments, which may have more limited fixed-capacity constraints on available hardware. All cloud providers offer services to automatically horizontally scale instances based on custom metrics, like AWS AutoScalingGroups. This enables the fleet of instances to dynamically change in numbers based on utilization metrics (like CPU, traffic request count, and message queue backlog).
On the other hand, vertical scaling is a strategy that deals with modifying the size of available compute capacity for workloads to utilize. Vertical scaling involves granting workloads access to additional compute resources like CPU and memory, allowing the application to handle a higher load without increasing the number of compute instances or workload replicas.
Vertical scaling may enable reduced operational overhead since there are fewer replicas of the workload and compute instances to deploy, manage, monitor, upgrade, etc. Vertical scaling requires an architecture that can leverage it and be designed to utilize the additional resources if they are available. Applications with bottlenecks within the code or external dependencies may not experience a performance benefit when vertically scaling, in which case the resources will be wasted. Testing how your applications behave with varying resource limits and benchmarking the results will help analyze whether additional resources will be effectively utilized.
To avoid having excessive numbers of compute hosts and workload replicas running and causing operational overhead, implementing some degree of vertical scaling is typically done to balance availability, scalability, and operational overhead.
Once the optimal resource configuration has been selected for the desired instance types, we can investigate what suitable cost optimization options are available.
The following sections detail the standard purchase planning programs that can provide compute cost savings.
When an instance is launched, users can select whether the instance is an “on-demand” or “spot” type. On-demand instances are assigned to the user with no possibility of the cloud provider withdrawing the compute capacity. On-demand instances cost more than spot but are a reliable compute resource for running stable workloads. Regardless of potential compute capacity exhaustion in the cloud provider data centers, on-demand instances will remain running in the user’s environment.
Spot instances offer an alternative pricing model. These instances can terminate within a couple of minutes (or be subject to a rise in price) if the cloud provider wishes to withdraw the compute capacity. In return for the unreliable nature of spot instances, the pricing is heavily discounted. Spot instances can be up to 90% cheaper than on-demand instances, making them invaluable for cost optimization efforts. You can check the pricing of spot versus on-demand in each cloud provider’s documentation (for example, AWS) to determine what the degree of discount may be (the numbers will vary regularly).
Organizations can reduce the risk of service interruption with spot instances by launching them in multiple availability zones. This will help ensure that even when interruptions occur, computing capacity is still available in other zones. Cloud providers also offer services such as AWS Instance Advisor to recommend instance types based on their probability of interruption, allowing users to optimize their spot instance selection. You can learn more about managing spot instances in the Spot Instances section of the Complete FinOps Guide.
A typical usage model for spot and on-demand instances is to create a mixed fleet with both usage types. This enables users to benefit from a balance of cost savings and reliable compute capacity. Bear in mind that the workloads must be able to handle short-notice instance terminations when the cloud provider withdraws spot instances. Applications may need to be architected to handle sudden interruptions and shift automatically to available compute resources.
Reserved instances and savings plans are other cost optimization features related to compute capacity; the idea is to obtain discounts by committing to long-term instance purchase contracts.
Cloud providers typically offer large discounts for organizations committing to long-term resource usage, which benefits organizations that have done capacity planning to determine their requirements. Knowing what instance types are required for your workloads and how long you’ll need them to run for will significantly influence what savings options are appropriate for your organization.
Reserved instances and savings plans allow the ability to commit to long-term compute usage in exchange for a discount. The difference between them is that savings plans can offer more flexibility, such as applying discounts to instances running in different regions or instance families. Reserved instances are more restricted, applying to a single region and limiting users to the defined instance types. There are two classes of reserved instances; Standard and Convertible. Convertible reservations allow users to exchange the reservation for another with different attributes like the instance type. Standard reservations may be more cost effective in exchange for limited customizability.Understanding your requirements clearly is crucial to leverage these features effectively and determine which savings options are suitable for your workloads.
Meet us at FinOps X Europe. Barcelona, Spain | November 12 -13, 2024.
Schedule a Meeting
Implementing resource optimization steps as a one-off will have limited usefulness; in reality, a production environment will have a variety of workloads with dynamic requirements, regularly fluctuating in their resource demands. New applications may be deployed, traffic patterns may affect resource utilization throughout the day/week/month/year, cost optimization requirements can change, and many other factors will regularly impact resource optimization efforts for cloud workloads. Attempting to invest engineering effort into manually optimizing resource configurations on a regular cadence will cause significant operational overhead.
The Densify optimization tool can integrate into existing pipelines to ensure analysis is completed during every build, allowing each application deployment to be configured with updated resource recommendations. Integrating automation tools with Densify’s recommendations will enable customers cloud and container resources to be optimized on a regular cadence.
Once the application has been running in production for a few weeks, there will be enough data for the Densify optimization tool to analyze and accurately recommend optimizations based on past patterns and real-world usage information. The tool can then produce recommendations for integrations like an AWS AutoscalingGroup’s instance family and type configuration, a Kubernetes Deployment’s request/limit values, or an EC2 instance configuration defined directly in a CloudFormation or Terraform template (using Densify’s Terraform module). An example of integration with Terraform is as follows:
instance_type = module.densify.instance_type
The instance type is dynamically set whenever Terraform is executed, ensuring instance types are being updated automatically based on the latest recommendations. The result for the engineers managing the environment is that resource values are output by Densify and injected into the resource configuration without requiring manual steps like metrics analysis. The environment now benefits from Densify’s accurate optimization recommendations being automatically integrated and deployed to the environment on a regular cadence, ensuring cloud and container rightsizing is occurring continuously.
Kubernetes is the most common platform (on-premise and on the cloud) for running containerized workloads. While Kubernetes simplifies many aspects of application lifecycle management, it introduces additional complexity regarding resource allocation in the cloud. Optimizing the cloud instances that containers are running on, must start with optimizing the Pod’s containers first. Resources for Pods are allocated through “Request” and “Limit” values.
Requests determine the minimum guaranteed resources a Pod should be allocated from a host compute instance. Limit values define the maximum resources the Pod can utilize before being throttled or evicted from the compute instance. Kubernetes administrators have a complex challenge of determining accurate Request and Limit values for all Pods in their clusters while trying to account for changing resource requirements based on seasonality.
Here is an example of how the Densify solution can be used to analyze and optimize workloads on Kubernetes automatically:
1) We have a Deployment object configured with a web application. Request and Limit values have been manually set based on administrators making the best guess, which likely needs to be more accurate. In this configuration, 500 millicores of CPU and 256Mi of memory have been allocated at minimum, with a limit of 1000 millicores of CPU and 512Mi of memory.
apiVersion: apps/v1 kind: Deployment metadata: name: web-app spec: replicas: 3 selector: matchLabels: app: web-app template: metadata: labels: app: web-app spec: containers: - name: nginx-webapp image: nginx:latest resources: requests: cpu: "500m" memory: "256Mi" limits: cpu: "1000m" memory: "512Mi"
2) Densify will collect Prometheus metrics from the Kubernetes cluster to determine Pod and Node utilization history via Kube State Metrics (a metrics collection tool). Densify can then perform intelligent statistical analysis on this data and provide recommendations exposed via an API or integrated with tools like Terraform, to enable organizations to integrate and deploy the recommendations easily. The recommendations will take into account factors such as long-term patterns, seasonality, nuanced requirements like network and disk I/O, the compatibility of the latest instance types, and business specific requirements. An example of a business-specific requirement is if your organization needs to deploy on a limited set of instances based on agreements with software vendors. Densify can take this requirement into account via its Policy Engine to ensure instance recommendations are tailored for the organization’s requirements. Leveraging machine learning capabilities will provide far more accurate recommendations than administrators manually determining appropriate values, which in turn can be automated
3) Densify provides an updated recommendation for the Request and Limit values which more appropriately suit the workload’s requirements. The data can be integrated with Terraform, Helm, and other tools to deploy the Kubernetes workloads. Only then can one look at optimizing the nodes (cloud instances) that they are running on.
resources: requests: cpu: "700m" memory: "512Mi" limits: cpu: "1200m" memory: "1000Mi"
4) Even after the recommended change has been deployed, Densify will continue evaluating and calculating further recommendations for the Kubernetes workload to ensure it is always accurately meeting changing requirements.
Optimizing resources in Kubernetes clusters can be a complex operation due to the number of components deployed in a standard setup and the amount of telemetry information generated by Pods and Nodes. Leveraging tools to analyze and provide recommendations automatically helps reduce operational overhead and manual analysis required by administrators.
A free 30-day trial for cloud & Kubernetes resource control. Experience in your environment or use sample data. See optimization potential in 48 hours. No card required.
Free TrialPerforming usage data analysis can be complex and time-consuming, especially in environments with large numbers of instances and a variety of different workload types running, and Kubernetes only adds to this complexity. The amount of metric data being generated will be significant and must be analyzed regularly to ensure resource allocation is aligned with workload requirements.
Densify is a resource optimization solution that can help you with this challenge of rightsizing cloud instances & the Kubernetes containers running on them, to reduce both risk and waste. Densify’s patented, policy-based analytics engine learns the operational workload patterns of each instance and applies customer refined-policy to produce precise and accurate recommendations. When ready, the implementation of recommendations can be automated.