VMware ESX 3.5 Update 2 bug wreaks havoc

VMware ESX 3.5 Update 2 contains a bug that hinders operation of virtual machines and VMotion. Some admins vented frustrations, and others sang the praises of Microsoft Hyper-V.

This Content Component encountered an error

Administrators who upgraded to VMware Inc.'s latest release of ESX and ESXi, 3.5 Update 2, had a terrible day on Aug. 12.

UPDATE

VMware has released an emergency patch to fix the problem. For more information, see the Virtualization Pro blog.

Because of a bug in ESX 3.5 Update 2, users who shut down their machines Aug. 11 received an error message that read, "A General System error occurred: Internal error" when they tried to re-start their virtual machines (VMs) or VMotion VMs running on ESX 3.5 Update 2 servers on Tuesday, Aug. 12. Virtual machines that were already running remained unaffected.

VMware acknowledged the problem in the afternoon of Aug. 12 by posting a statement on its VMware Knowledge Base site and by the end of the day had released an express patch.

VMware also pulled the ESX 3.5 Update 2 bits from the download pages so that no additional customers could download the broken build.

 I won't move the update into production unless they prove it is solid.
Dan Buchanan,
Microsoft engineer and VMware user

VMware advised users not to install ESX 3.5 U2 if it had been downloaded prior to Aug. 12, 2008. To work around the issue, VMware suggested setting the host time to a date prior to Aug. 12.

Unfortunately, VMware's workaround probably didn't work for most production environments.

"This workaround has a number of very serious side effects that could threaten production environments. Any VMs that sync time with an ESX host and serve time-sensitive applications will be broken. These include, but are not limited to, database servers, mail servers, and domain administration systems," VMware reported on the Knowledge Base site.

ESX admin fallout
The ESX 3.5 Update 2 bug may have affected a great deal more VMware shops than if the bug had occurred in an earlier version of ESX, as it was the basis for the newly free ESXi, the price of which was reduced last month..

And those users made their grievances known, complaining about the bug on forums like the ARS Technica Server Room.

For more on ESXi and ESX 3.5:
Critical ESX 3.5 Update 2 bug gives many users a nasty surprise

VMware to give away ESXi for free

Free VMware ESXi sparks call for the source code

One blogger wrote, "It's pretty bad all around. I would hate to be in an environment with super-strict change management right now."

Another blogger on the forum wrote, "All of my hosts are in a production DRS [Distributed Resource Scheduler] cluster, and thus all of the hosts are populated with some number of guests. I am going to have to down at least one full host's worth of guests to apply this patch. I would guess that this is the situation most ESX admins will face."

On the blog, users who implement VMware upgrades in their test-and-development environments for a few weeks before moving it into production thanked their lucky stars, and some users sang new praises for Microsoft Hyper-V.

Dan Buchanan, a senior Microsoft engineer at a major global financial services provider and a longtime VMware user, said VMware's slow reaction to the bug is unacceptable.

"It took VMware up until 1 p.m. [on Tuesday, Aug. 12.] to post an official statement on the issue, and they still have not reached out to their customers," Buchanan said in the afternoon. [Editors' note: Later in the day, VMware CEO Paul Maritz did in fact issue an apology about the bug on a VMware blog. ] "They said they expect to have the issue resolved within 36 hours. That is unacceptable to users."

Turning back the system clock to a date before Aug. 12 proved tricky for Buchanan, because many of the systems at the financial institution are time-sensitive.

"We had to quickly change the date to Aug. 10, get the VMs running again and work as fast as we could to change the time to Aug. 12 again. Once the VMs are running, it works fine. It's when we shut down that is the issue," Buchanan said.

Luckily, Buchanan implemented the update only in his test environment, where he expected to let it "bake in" for a few weeks. Now he may not implement the update at all.

"I won't move the update into production unless they prove it is solid. If they fix it, I won't move it into production for 90 days," Buchanan said, adding that if he had put the update into his production environment, he would have been fired.

Released just last month, some features of Update 2 include support for Windows 2008 and Solaris as guest operating systems, and support for additional hardware like 8 Gb Fibre Channel and 10 Gb iSCSI initiators. It also includes support for full server Health Status in ESX and ESXi; Red Hat Enterprise Linux (RHEL) 3.0 U9; live cloning of VMs; and enhancements to Virtual Center alarms.

Let us know what you think about the story; email Bridget Botelho, News Writer. Also, check out our Server Virtualization blog.

Dig deeper on Server virtualization risks and monitoring

Pro+

Features

Enjoy the benefits of Pro+ membership, learn more and join.

0 comments

Oldest 

Forgot Password?

No problem! Submit your e-mail address below. We'll send you an email containing your password.

Your password has been sent to:

-ADS BY GOOGLE

SearchVMware

SearchWindowsServer

SearchCloudComputing

SearchVirtualDesktop

SearchDataCenter

Close