Although it may seem obvious, the more virtual machines (VMs) that are put on a single physical server, the greater the detrimental effect will be when that physical server suffers an outage. But you might say, "Andrew, I thought the point of virtualization was to create a zero-downtime infrastructure where if a physical server goes offline then all of its VMs are just failed over to a separate physical server?" You are correct, that is the idea in practice, but unfortunately reality has a nasty habit of deviating from pristine theory.
High availability technologies, such as VMware VI3 High Availability (HA), work by creating a single, logical resource cluster out of several physical servers that all have access to the same shared storage (fibre channel (FC) and iSCSI typically). When a physical server in the cluster fails the cluster's HA agent directs the remaining, online, physical servers to take ownership of the offline VMs and restart them. Because their is a resource penalty during a VM boot, the virtualization software typically starts only three VMs at a time on a given host. If you have deployed a virtualization solution on several 4U servers with eight processors and 64 gigabytes (GB) of random access memory (RAM) in order to host 30 VMs on each server, when a single server fails the event causes the relocation of 30 VMs to occur. The speed at which this process happens depends completely on the virtualization solution at hand, VMware VI3 offering one of the most intelligent HA agents available (and highly so! HA! I kill me.).
The above scenario represents all too well a real-world situation that is provides evidence that a high consolidation ratio (30 VMs on a single server) also represents a high risk ratio.
The more you consolidate, the greater risk you assume. This cannot be avoided. The same is true of any type of consolidation project, physical or virtual. If you were to consolidation 10 web sites on a single physical web server using Apache virtual hosts then you have a consolidation ratio of 10 to 1, but you have also assumed a risk ratio of 10 to 1. If one server goes offline, you lose ten services. The level of risk is diminished with virtualization through its high availability feature set, but it is not erased. High availability fail overs take time -- down time. The question is: how much down time are you willing to risk for greater consolidation
A rack with 42 1U rack-mounted servers that only host 5-10 VMs a piece will provide much speedier failover times, but you have not consolidated physical hardware nearly as much as you could with 4U servers, or even blade servers. Blade servers themselves represent another type of consolidation, physical hardware consolidation via use of shared components such as power supply units (PSUs), network interface adapters (NICs), and fibre channel host bus adapters (HBAs) for SANs. This just goes to show that in a virtualization project the number of VMs to place on a single server is not the only consolidation choice you make, the type of server you choose also affects the consolidation ratio. If you use blade servers then you increase the consolidation ratio, also increasing assumed risk.
A data center that is risk free will have completely redundant power, cooling, networking, and servers. And those servers will only host a single service, and each service will operate in a cluster. Get the picture? Being risk free is not without its own price -- the risk ratio is inversely proportionate to money spent.
To operate a close-to-risk-free data center costs a pretty penny. The secret is to discover the most harmonious balance between risk management and the amount of money spent to get there. This will help you determine your consolidation ratio, and subsequently assumed risk.
In conclusion, you should not approach a consolidation project with a specific consolidation ratio in your head. Approach the project with the goal of finding the middle ground between acquisition cost and total cost of ownership (TCO) of an IT infrastructure and the amount of money lost if that infrastructure, or part of it, goes down for a given amount of time. This will help you determine your the right amount of assumed risk, which in turn should be your target consolidation ratio.
Dig Deeper on Improving server management with virtualization
Related Q&A from Andrew Kutz
A user wonders how well Ubuntu will serve him/her in terms of stability, and gets release recommendations from an expert. Continue Reading
This expert's insights will help you make a decision whether to use Ubuntu remote backup. Continue Reading
Learn about an emerging product that aims to decrease time spent fixing dependencies. Continue Reading