In cloud computing, scaling is the process of adding or removing compute, storage, and network services to meet the demands a workload makes for resources in order to maintain availability and performance as utilization increases. Scaling generally refers to adding or reducing the number of active servers (instances) being leveraged against your workload’s resource demands. Scaling up and scaling out refer to two dimensions across which resources—and therefore, capacity—can be added.
The demands of your cloud workloads for computational resources are usually determined by:
Scaling up refers to making an infrastructure component more powerful—larger or faster—so it can handle more load, while scaling out means spreading a load out by adding additional components in parallel.
Scaling up is the process of resizing a server (or replacing it with another server) to give it supplemental or fewer CPUs, memory, or network capacity.
Vertical scaling minimizes operational overhead because there is only one server to manage. There is no need to distribute the workload and coordinate among multiple servers.
Vertical scaling is best used for applications that are difficult to distribute. For example, when a relational database is distributed, the system must accommodate transactions that can change data across multiple servers. Major relational databases can be configured to run on multiple servers, but it’s often easier to vertically scale.
Vertical scaling poses challenges to certain workload types:
Instead of resizing an application to a bigger server, scaling out splits the workload across multiple servers that work in parallel.
Applications that can sit within a single machine—like many websites—are well-suited to horizontal scaling because there is little need to coordinate tasks between servers. For example, a retail website might have peak periods, such as around the end-of-year holidays. During those times, additional servers can be easily committed to handle the additional traffic.
Many front-end applications and microservices can leverage horizontal scaling. Horizontally-scaled applications can adjust the number of servers in use according to the workload demand patterns.
The main limitation of horizontal scaling is that it often requires the application to be architected with scale out in mind in order to support the distribution of workloads across multiple servers.
Each of these services provides the same core capability to horizontally scale, so we’ll focus on AWS Auto Scaling Groups for simplicity.
An Auto Scaling group is a set of servers that are configured the same and function as a single resource. This group is sometimes called a cluster. Workloads are distributed across the servers by a load balancer. The load balancer is an endpoint that allows clients to connect to the service without having to know anything about the configuration of the cluster. The client just needs to have the DNS name or IP address of the load balancer.
What instance (node) type and size should be used in an Auto Scaling group? That depends on the workload patterns. Instances should have a combination of CPUs and memory that meets the needs of workloads without leaving resources unused.
When creating an Auto Scaling group, you have to specify a number of parameters, including the minimum and maximum number of instances to have in the clusters, and a criterion for triggering adding or removing an instance from the cluster. The choice of parameter values will determine the cost of running the cluster.
The minimum number of instances should be enough to meet the base application load, but not have too much unused capacity. A single instance configured to meet these low-end needs may seem like the optimal choice for the instance type, but that’s not necessarily the case. In addition to thinking about the minimum resources required, you should consider the optimal increment for adding instances.
For example, a t3.xlarge with four virtual CPUs and 16GB of memory may be a good fit for the minimum resources needed in a cluster. When the workload exceeds the thresholds set for adding an instance, such as CPU utilization exceeding 90% for more than three minutes, another instance of the same time is added to the cluster. In this case, another t3.xlarge would be added. If the marginal workload that triggered the addition of a server isn’t enough to utilize all the CPUs and memory, the customer will be paying for unutilized capacity. In this case, a t3.large with two virtual CPUs and 8GB may be a better option. The minimum number of instances can be set to two to meet the base load.
Manually monitoring network resource utilization is a continuous, time consuming process, and forecasting the need to scale up or scale out is computationally complex.
Within most organizations, these challenges are addressed across a continuum of strategy sophistication:
At enterprise scale—when scaling is applied across many workloads—the stakes can be quite high:
Densify enables managers of large autoscaling infrastructures to optimize performance and spend. Our machine learning analyzes the loads across all your Auto Scaling group nodes and generates recommendations for node type and sizing to better match the entire autoscaling group to the evolving demands of the workload running within. Densify also recommends changes to the minimum and maximum settings for autoscaling to better reflect actual realized demand patterns of your workload.
Get a demo of Densify today and see how your team can leverage our recommendations to optimize and efficiently and prudently manage your autoscaling cloud infrastructure.
Request a Demo