This week, the biggest news item on SearchServerVirtualization.comwas the havoc caused by VMware Inc.’s ESX 3.5 update 2 bug, which kept virtual machines (VMs) from booting up and live migrating (VMotion) on Aug. 12.
Users posted their fury on IT forums like ARS Technica’s the Server Room. One user on the forum summed up the situation perfectly. “This was a very big deal, make no excuses for VMware. It certainly had potential to completely disrupt a lot of customers. … At most it should have disabled VMotion and other extras but not starting a VM.”
In the afternoon on Aug. 12, VMware issued an Express Patch on its Knowledgebase site and warned users not to install ESX 3.5 Update2 or ESXi 3.5 Update 2 if it has been downloaded from VMware’s website or elsewhere prior to Aug. 12, 2008.
VMware’s new CEO, Paul Martiz, issued an apology letter the day of the bug explaining the issue.
When the time clock in a server running ESX 3.5 or ESXi 3.5 Update 2 hits 12:00AM on August 12th, 2008, the released code causes the product license to expire. The problem has also occurred with a recent patch to ESX 3.5 or ESXi 3.5 Update 2. When an ESX or ESXi 3.5 server thinks its license has expired, the following can happen:
- Virtual machines that are powered off cannot be turned on;
- Virtual machines that have been suspended fail to leave suspend mode; and,
- Virtual machines cannot be migrated using VMotion.
The issue was caused by a piece of code that was mistakenly left enabled for the final release of Update 2. This piece of code was left over from the pre-release versions of Update 2 and was designed to ensure that customers are running on the supported generally available version of Update 2.
… I am sure you’re wondering how this could happen. We failed in two areas:
- Not disabling the code in the final release of Update 2; and
- Not catching it in our quality assurance process.
We are doing everything in our power to make sure this doesn’t happen again. VMware prides itself on the quality and reliability of our products, and this incident has prompted a thorough self-examination of how we create and deliver products to our customers. We have kicked off a comprehensive, in-depth review of our QA and release processes, and will quickly make the needed changes.
I want to apologize for the disruption and difficulty this issue may have caused to our customers and our partners. Your confidence in VMware is extremely important to us, and we are committed to restoring that confidence fully and quickly.
It remains to be seen whether Maritz’ apology is enough to satisfy frustrated users. A major issue like this may prompt users to try other virtualization products. For instance, the day of the incident, some users were singing praises of Microsoft Hyper-V on technical forums.
Either way, having to deal with this issue after only a month in charge is really initiation by fire for Maritz.
And I imagine that VMware co-founder and ex-CEO Diane Greene, who was ousted by VMware’s board of directors July 8, might feel at least somewhat vindicated.