Fault-tolerant servers are making their way into data centers with small budgets. Companies like Stratus Technologies Inc. and Hewlett-Packard Co. are building more affordable options, and virtualization
With fault-tolerant servers, systems continue to operate properly when one or more faults in their components occur. If a primary component fails, a twin component seamlessly takes over the application running on that component. As a result, component failures don't result in loss of data or compromise applications as is the case with a typical server. Fault-tolerant servers differ from software-based failover clustering, in which a hardware or software failure on one server causes the workload to shift to a second server.
Even though most high-end servers employ at least some redundant components -- like hot-swappable power supplies or error-correcting code memory -- these servers still fail when a nonredundant component such as a microprocessor fails. In a fault-tolerant server box, the redundant components execute the same instructions in lockstep, and self-checking technology detects and isolates errors at the component level. When a hard error occurs, the faulty component is taken out of service while the duplicate component continues normal processing.
Virtualization and the fault-tolerance landscape
As data centers adopt virtualization, the fault-tolerant server segment could become more important, though it is too early to predict by how much, said George Hamilton, director of enterprise infrastructure at the Boston-based technology research and consulting firm Yankee Group Research Inc.
"As people move into virtual environments, they are more concerned about downtime events," Hamilton said. "When one physical machine goes down, it affects many virtual machines running on it, so you want to make sure that infrastructure does not fail. If you are going to put all of your eggs in one basket, you'd better make sure that it is one hell of a basket."
According to Steve Josselyn, enterprise research director at Framingham, Mass.-based research firm IDC, the main players in the relatively small fault-tolerant market today are IBM's System z mainframe, which is basically used as a fault-tolerant system, Maynard, Mass.-based Stratus Technologies, HP's with its NonStop server line, and Santa Clara, Calif.-based NEC Corp.
Even with relatively few fault-tolerant server vendors in the space, this portion of the market generates substantial revenue, Josselyn said. Based on 2005 data, the latest from IDC, the fault-tolerant market makes up 4% of overall server spending: a $2.2 billion industry in a $54.8 billion market.
Thus far, barriers to broader consumer uptake have been cost and technical expertise. Prior to 2000, the typical cost for an entry-level fault-tolerant server running a proprietary operating system was $250,000. And in years past, writing programs for fault-tolerant servers was quite complicated, creating significantly higher initial and long-term costs, according to a Microsoft white paper.
Consumers of fault-tolerant servers have traditionally occupied niche markets where systems cannot go for financial or other environmental reasons, said Gordon Haff, an analyst at Illuminata Inc. of Nashua, N.H.
"By historical standards, the price premium is actually fairly modest," Haff said. ["But] it's a premium that buyers are mostly unwilling to pay, outside of specific vertical applications with particularly stringent availability requirements."
Now, fault-tolerant server vendors like Stratus Technologies are trying to change that with lower-priced versions of fault-tolerant servers, which the company believes will boost adoption.
In July, Stratus announced the availability of fault-tolerant servers for companies with small IT staff and budgets. Together with the two-socket quad-core ftServer 6200 system announced in March, the new dual-core Intel Xeon-based one-socket 2500 and one- or two-socket 4400 models run Windows or Linux and allow for wide-ranging configurability, workload support, processing power, greater I/O and memory capacity than previous generations.
The new ftServer 6200 has two quad-core Xeons on each of two boards -- four packages with a total of 16 processing engines -- and it delivers 450% more power at one-third the price of systems two generations back. "These are 4U servers, high-end machines sold at a $50,000 price point: a decrease in price with a dramatic increase in performance," said Denny Lane, director of product management and marketing for Stratus.
"Our systems are designed around simplicity," said Lane. "You really don't need much expertise to run these servers, which is particularly important in environments with little IT staff or in remote locations. On the low end, people are starting to use our  systems more because the prices are dropping [about $15,000 per box] and because of the low level of technical resources necessary to run them."
In May, Don Nguyen, the IT manager at a small Toronto-based Internet service startup called Yootel Communications, bought two Stratus ft5700 servers as well as Stratus software and uses them to deliver Voice over Internet Protocol services to residents and business customers.
Nguyen said he is pleased with server performance at this point and will add additional ones as the company grows.
If fault-tolerant server hardware is still too expensive, fault-tolerant software presents an alternative. Littleton, Mass.-based Marathon Technologies Corp.'s everRun FT software runs on two industry-standard x86 servers and creates a single virtual Windows environment. If one server goes down, there is no disruption to applications.
"The application takes the fault-tolerant server philosophy and applies it to software," said Steve Keilen, vice president of marketing at Marathon.
The benefit of a software approach is that a user can put it on two standard servers that are in separate data centers; each server is on a separate power grid for added protection, Keilen said. If both servers go down, however, the everRun FT layer will go down as well.
Marathon's software works by pooling the physical resources of two standard Windows-based servers in a single operating environment. It does so by virtualizing the two servers to appear and operate as one. The everRun software sits below the server operating system and continuously monitors and tests all I/O components. When a failure occurs, the software immediately redirects I/O away from failed devices to redundant devices, so applications continue to operate without interruption and without loss of data, said Brian Mullins, director of corporate communications for Marathon.
"The two physical Windows servers truly appear and operate as though the application was operating on a single standalone server," Mullins said. "This single virtual server appears to the rest of their environment with one identity and one IP address. Their application only sees this single virtual server which is where it would be installed and operated."
For everRun FT, the servers need to be identical, but with other software offerings, like everRun HA, the servers do not have to be identical.
Both versions of everRun work with standard off-the-shelf Intel Corp. and Advanced Micro Devices Inc. servers with Windows Server 2003 and at least 1 GB of RAM and 6 GB of storage. The software does not require a storage area network but can work with one.
The list price for everRun FT is $16,000 -- much less expensive than buying a fault-tolerant server. Keilen, who happened to work at Stratus for three years, said there are pros and cons to the two approaches, like scalability tradeoffs. The user also has to purchase three versions of Windows 2003 -- one for each server, plus a third for applications.
"Our software is much more flexible than fault-tolerant hardware in fitting in with a company's existing IT infrastructure," said Mullins. "Whatever Windows servers they already are buying will work just fine with our stuff -- no special hardware that you have to get the standards committee to approve. No proprietary parts or expensive service contracts. No application modifications or scripting. [It is] fault tolerance without the hassles or headaches."
Marathon also expects to address the virtualization trend more directly. This April the company announced it had partnered with XenSource Inc. to integrate XenEnterprise with everRun FT and that it would extend everRun to protect the XenEnterprise hypervisor. Marathon also promised an intermediate high-availability product, which will enable XenEnterprise users to cluster individual virtual machines.
Let us know what you think about the story; email Bridget Botelho, News Writer.