How much consolidation is possible, and are there any risks to having a high server consolidation ratio?
It's important for new adopters to understand that server consolidation is not a single static number -- there is no correct server consolidation ratio, and every business must establish consolidation goals that are sensible and appropriate for the organization's unique needs. For example, 100% consolidation is certainly possible (where all of the server's available computing resources are used), but it isn't always necessary or appropriate.
Consider an organization with 100 physical servers. Suppose those servers are virtualized and each server takes on two workloads (rather than one). This 2:1 server consolidation ratio cuts the server count in half; 50 servers can do the work of 100. This also cuts server capital and monthly power costs in half. It's a terrific goal with substantial payback even though two workloads may not come near the server's total computing capacity.
Workload migration software, such as Microsoft's Live Migration, are the principal tools for moving and consolidating workloads. As experience and confidence with virtualization grows, organizations will generally implement subsequent phases of consolidation in order to systematically migrate more workloads onto fewer servers as computing resources allow. For example, the 2:1 server consolidation ratio above may be followed by additional consolidation later on, perhaps shaving another 10 servers from the total system count.
Full (100%) consolidation may be undesirable for practical reasons. For example, some workloads may present highly variable or cyclical resource demands. This might require temporarily allocating additional resources to the workload that might only be used during certain times of day, days of the month, or during times of high user demand. In other situations, it may be prudent to leave some computing resources available on each server to accommodate failover from other servers. If every server is fully consolidated, there will be inadequate resources available for failover, and this can render some affected workloads unavailable until the troubled server is repaired.
Higher levels of consolidation will always pose greater risks, such as workload availability and recovery time.
Consider a single VM on a single virtualized server. If the server fails, IT staff only needs to restore or restart that one workload on another system. The situation is a little more challenging when multiple VMs are on the same system. As a VM starts up, it uses bandwidth and computing resources -- this leaves less bandwidth and computing power to restart a second VM, which leaves even less resources to restart a third, and so on. For a system with 10 or more workloads, a complete recovery could take considerable time.
Consequently, organizations must address and mitigate this greater reliance on fewer server platforms. In addition to increased network bandwidth and storage I/O performance, next-generation servers often have superior reliability features, such as redundant power supplies or memory sparing. Other approaches include server clustering or VM replication to ensure that mission-critical workloads have a second (or even a third) iteration that can take over if needed. Regardless of the approach, the goal is to alleviate restoration problems by preventing workload disruptions in the first place.
In general, virtualization-aware systems management software can correlate resource utilization for each workload against the resources available on any server. IT administrators can then make informed decisions about VM migrations and the impacts on remaining resources to decide on the best server consolidation ratio for their organization.
This was first published in November 2013