The phrase "too big to fail" often describes large organizations whose influence is so strong that their failure...
would have a devastating effect on their or related markets.
In the U.S., this term has been used to justify government intervention on Wall Street, in the auto industry and elsewhere. But the concept of "too big to fail" also applies to high-performance computers in your virtual machine (VM) infrastructure.
In recent years, there have been tremendous gains in technology and infrastructure. New processor advancements from Intel and AMD have led to a generation of multicore and multi-threaded processors that achieve amazing results within a single socket. Add in the advancements in storage and memory densities, and your virtual machine infrastructure can have some serious power in a relatively small package.
The power of high-performance computers
The standard server offerings shipping today make the clustered computing environments of just a few years ago look like my old Apple Newton (even with the 2 MB PCMCIA expansion card). With the power of these high-performance computers, however, there comes great responsibility.
Four years ago, I began building VMware ESX servers with rackmount servers from Hewlett-Packard that had two quad-core processors, 32 GB of RAM and 64 GB of local SCSI storage. Now, in my virtual machine infrastructure, I am building ESX servers on half-height HP blades with two quad-core Nehalem processors -- multi-threaded to behave like eight cores -- with 144 GB of RAM and 64 GB of local high-speed solid-state drive storage.
Those first environments got 20 to 30 virtual machines on each host, and I have yet to really max out a server with this new hardware. To really see the raw computing power available, consider that these new servers could go up to 192 GB of RAM and that you can fit 16 of them in a single blade enclosure. That is some serious computing power, and it will only get better with HP's G7 hardware on the horizon.
You look at that processing power and think, "What a great era we live in." When first reviewing these scenarios for a new VMware vSphere environment, I actually got excited about the thoughts of ridiculous VM densities and greatly reduced data center footprint ... until my thoughts drifted to high availability (HA). I realized that these new, high-performance computers may lead us to create systems that are indeed too big to fail.
High-performance computers and high availability
Just what does an ESX host failure involve? Without getting into the details of admission controls and other HA settings, it means that all the VMs from a given host attempt to restart on other ESX hosts within the HA cluster. When you have a fair number of ESX hosts with a lower VM density, that load spreads pretty well.
But if your virtual machine infrastructure has just a few very large servers and you have a higher VM density, you have just kicked off a boot storm. You will have a large number of VMs all competing for shared disk, CPU and memory on the few servers left standing. That can quickly create a domino effect, with one HA event leading to another.
Failures happen. They cannot be avoided. As great as the ESX and vSphere HA features are, they only recover from failures. They cannot avoid them. Knowing that an HA event is a matter of when and not if, your true HA goals are to minimize the effects. This strategy flies in the face of the usual goals of maximizing efficiency and driving toward higher VM densities. But as you engage in this delicate balancing act, you may begin to realize that some of these great new high-performance computers may not actually be the best for your virtual machine infrastructure.
In selecting an ESX host platform, there are a lot of outside factors to consider. How many hosts will you have in your cluster? How many VMs will need to restart in case of an HA event? Will you use a dedicated HA host or preserve a percentage of resources on all ESX hosts?
There is no hard rule, no definite right or wrong. You have to weigh the options to determine which strategy best meets your unique business goals. Just beware the shiny new servers with outrageous resources. You may ultimately create a monster that is literally too big to fail.
About the author:
Mark Vaughn (MBA, VCP, BEA-CA) serves as an enterprise architect for a multinational corporation. Vaughn has more than 14 years of experience in IT as a Unix administrator, developer, Web hosting administrator, IT manager and enterprise architect. For several years, he has focused on using the benefits of virtualization to consolidate data centers, reduce total cost of ownership, and implement policies for high availability and disaster recovery. Vaughn is a recipient of the 2009 vExpert award and has delivered several presentations at VMworld and BEAWorld conferences in the U.S. and Europe. Read his blog at http://blog.mvaughn.us.
High-performance computing becomes more mainstream
Pushing high-performance computing toward the cloud
Energy-efficient high-performance computing