Surviving the Server Chip Shortage

Struggling with lengthy Server lead-times?

The supply challenges with computer chips has impacted the lead time for physical servers world-wide and has been felt by almost every company we’ve talked with.

Getting approval for these large capital expenses was difficult enough when lead-times were 3 months. Now with lead-times up to a year, and prices also increasing, it is becoming exceeding difficult to properly plan and budget appropriately.

The Solution

There are a few responses that we hear often:

  1. Move to the Cloud
  2. Place orders well in advance on speculation you will need more servers
  3. Extend the life of existing servers by delaying your normal life-cycle refresh
  4. Outsource to third-parties (if they can commit to supply)

Another solution, that we will go into more detail on, is to improve the efficiency of your existing servers to accommodate your internal demand and growth.

It is common to see approximately 45 % of existing applications are allocated far more resources than they require. This includes vCPU, memory and storage. If you could identify those specific workloads and reclaim that unused capacity, you could re-assign that capacity to provision additional applications onto your existing server pool.

Below is an example:

caption

There are few key steps to freeing up this unused capacity sitting in your data center.

Step 1
Identify those workloads that are using far less resources than allocated
Step 2
Determine the amount of resources that can be reclaimed from each workload
Step 3
Reclaim those specified resources
Step 4
Assign those reclaimed resources to new builds or growth workloads

Step 1: Identify those workloads that are using far less resources than allocated:

It is recommended that you set some thresholds for each resource, so that you can systematically analyze each workload to determine which ones have excess resources.

Typical values are 30 % or less vCPU utilization; yy % or less memory utilization and xx % for storage. Next you need to consider the precise metric you want to measure. Is it average utilization; sustained utilization or peak utilization. As well do you look at 90 % percentile; or 95 %, or perhaps 98 %. Finally do you look at the past week, month or quarter and similarly do you look at the busiest day or averages.

We also recommend you consider using different thresholds depending on whether you are looking at a Dev; Test; or Production environment. We don’t recommend looking at averages or low percentiles, as this can hide some critical application requirements.

caption

At Densify, we provide a set of policies to help you configure each of these parameters, so the analysis and recommendations are based on your specific company requirements.

Step 2: Determine the amount of resources that can be reclaimed from each workload

Ideally you can set thresholds to enable this appropriate amount can be calculated and not estimated.

High thresholds of 80 % vCPU and 90 % memory are common. Once these are set, you can calculate the exact amount of resources that can be reclaimed to place that workload in-between your low and high threshold settings. We have found that a graphical view of the before and after resource consumption measurements goes a long way in making an application owner comfortable with the recommendation.

If you have established defined workload sizes (often referred to as t-shirt sizing); such as 2 vCPU X 4 GB memory and the next size up being 4 vCPU and 8 GB memory; make sure your recommendations adhere to these standards.

Step 3: Reclaim those specified resources

This can be a 1 or 2 step process. With many companies the first step is a change request sent to the workload/application owner approving the change. The second step is a change request to implement the same. Many customers have combined this into a streamlined process using a single change request.

Once approved, you can manually adjust the workload resource configuration during a maintenance window or use an automated process. Automated processes can be set to only make the changes during a defined maintenance window, or at a specified time outside of prime business-hours. The procedure itself is quick, but does require a shut-down are restart of the instance.

Step 4: Assign those reclaimed resources to new builds or growth workloads

This is the easiest and most rewarding step. Depending on your provisioning process, you should immediate visibility and access to these reclaimed resources to provision new workloads.