In cloud computing, scaling is the process of adding or removing compute, storage, and network services to meet the demands of a workload. Autoscaling (sometimes spelled auto scaling or auto-scaling) is the process of automatically increasing or decreasing the computational resources delivered to a cloud workload based on need. Usually, this means automatically adding or reducing the number of active servers (instances) being leveraged against your workload within an infrastructure. The demand for computational resources is usually determined by:
The primary benefit of Autoscaling, when configured properly, is that your workload gets exactly the cloud computational resources it requires (and no more or less) at any given time. You pay only for the server resources you need, when you need them.
There are two broad categories of scaling: vertical and horizontal.
Vertical scaling is the process of resizing a server (or replacing it with another server) to give it more or fewer CPUs, memory, or network capacity.
One of the advantages of vertical scaling is that it minimizes operational overhead. There’s only one server to manage. There’s no need to distribute workload and coordinate among multiple servers.
There are limits, though. You can add only so much memory and CPU to a single instance. Even if there’s an instance that has sufficient CPU and memory, some of those resources may be idle at times—yet the customer continues to pay for those unused resources.
Vertical scaling is best used for applications that are difficult to distribute. For example, when a relational database is distributed, the system must accommodate transactions that can change data on multiple servers. Major relational databases can be configured to run on multiple servers, but it’s often easier to vertically scale.
Horizontal scaling takes a different approach. Instead of resizing an application to a bigger server, you can split the workload across multiple servers. For example, a retail website might have peak periods, such as around the end-of-year holidays. During those times, additional servers can be committed to handle the additional traffic. Applications like websites are well-suited to horizontal scaling, because each user of a website can be assigned to a single server. There’s little need to coordinate tasks between servers. Many front-end applications and microservices can leverage horizontal scaling.
Horizontally scaled applications can adjust the number of servers in use according to the workload-demand patterns.
The major cloud vendors all offer autoscaling capabilities. In AWS, the feature is called Auto Scaling groups. In Google Cloud, the equivalent feature is called instance groups, and Microsoft Azure provides Virtual Machine Scale Sets. Each of these provides the same core capability to horizontally scale, so we’ll focus on just AWS Auto Scaling Groups for simplicity.
An Auto Scaling group is a set of servers that are configured the same and function as a single resource. This group is sometimes called a cluster. Workloads are distributed across the servers by a load balancer. The load balancer is an endpoint that allows clients to connect to the service without having to know anything about the configuration of the cluster. The client just needs to have the DNS name or IP address of the load balancer.
What instance (node) type and size should be used in an Auto Scaling group? That depends on the workload patterns. Instances should have a combination of CPUs and memory that meets the needs of workloads without leaving resources unused.
When creating an Auto Scaling group, you have to specify a number of parameters, including the minimum and maximum number of instances to have in the clusters, and a criterion for triggering adding or removing an instance from the cluster. The choice of parameter values will determine the cost of running the cluster.
The minimum number of instances should be enough to meet the base application load, but not have too much unused capacity. A single instance configured to meet these low-end needs may seem like the optimal choice for the instance type, but that’s not necessarily the case. In addition to thinking about the minimum resources required, you should consider the optimal increment for adding instances.
For example, a t3.xlarge with four virtual CPUs and 16GB of memory may be a good fit for the minimum resources needed in a cluster. When the workload exceeds the thresholds set for adding an instance, such as CPU utilization exceeding 90% for more than three minutes, another instance of the same time is added to the cluster. In this case, another t3.xlarge would be added. If the marginal workload that triggered the addition of a server isn’t enough to utilize all the CPUs and memory, the customer will be paying for unutilized capacity. In this case, a t3.large with two virtual CPUs and 8GB may be a better option. The minimum number of instances can be set to two to meet the base load.
Densify enables managers of large autoscaling infrastructures to optimize performance and spend. Our machine learning analyzes the loads across all your Auto Scaling group nodes and generates recommendations for node type and sizing to better match the entire autoscaling group to the evolving demands of the workload running within. Densify also recommends changes to the minimum and maximum settings for autoscaling to better reflect actual realized demand patterns of your workload.
Get a demo of Densify today and see how your team can leverage our recommendations to optimize and efficiently and prudently manage your autoscaling cloud infrastructure.Request a Demo