Server uptime and hardware failure guide
A comprehensive collection of articles, videos and more, hand-picked by our editors
In today's world, it seems 24/7 isn't enough. Customers don't just expect data to be at their fingertips, they...
demand it on multiple platforms and delivery methods. The increased demand for availability is still growing and often drives IT departments to push for costly infrastructure improvements. This can create a dynamic in which IT is accused of overspending, but still gets blamed for an outage. So, instead of pushing for upgrades, IT should start insisting business leaders make the hard decisions when it comes to how much server uptime is enough.
The rising demand for uptime
The average consumer does not understand why systems go offline and it can fuel misguided generalizations. Of course, we complain when sites such as Gmail or Netflix go offline, and compare it to the end of the civilized world. We reach out to Twitter and social media to voice our complaints on how these sites are never online when, in reality, outages are the rare exception.
While outages are often damaging to organizations and can result in lost services or profits, social media has given the end customer an additional ability to take a previously isolated issue and broadcast it to the world in minutes. It used to be that press releases written by trained staff communicated issues and outages. But, in today's world, issues can be revealed to the public before all of the proper personnel at a company even know about the problem. This creates public relation challenges for companies in the worst possible times. These challenges have helped to renew a focus on availability and brought server uptime to the forefront of IT discussions.
What do all those nines mean?
Today, uptime is often measured by "nines." Is your data available 99% of the time or 99.999%? More importantly, is there really that much of a difference? Let's look at how those availability levels translate to downtime:
|Uptime level||Downtime per year|
|99.9% (three nines)||8.76 hours|
|99.99% (four nines)||52.56 minutes|
|99.999% (five nines)||5.26 minutes|
|99.9999% (six nines)||31.5 seconds|
The goal for many companies is 99.9999% availability, but with each nine you add, costs can increase greatly. Moving from one level to another can encompass things from redundant servers to redundant storage frames or even duplicate data centers. This availability journey can cost thousands or millions of dollars to reach the 99.9999% uptime level. The decision to move forward with this level of uptime should not be an IT decision, but a business decision. While the effect of increasing server uptime is directly related to what IT purchases and implements, IT cannot drive the decision.
Reframe server uptime for a business perspective
Every disaster recovery or availability project I have been involved with has been driven from an IT perspective. After all, who knows more about IT systems than the IT department? The problem is, when IT drives availability, it will take on an IT perspective. It becomes easy to focus solely on the systems and hardware that need to be up from an infrastructure viewpoint.
Infrastructure is critical to a business, but it does not always address the needs of the business, and that is the real key here. While business applications rely on the underlying infrastructure, the infrastructure alone is not enough to make your business work. Simply put, the business has to determine what infrastructure it needs to support an application.
However, having the business drive availability does not mean that IT doesn't have a huge role in this effort. IT personnel have a responsibility to ensure the business has the correct knowledge to make informed decisions. Moving from 99.9% availability to 99.9999% is a process that requires investment in equipment, time and staffing resources -- all factors that must be presented to the business before a decision is made. IT must also make business leaders aware of the risks of not moving to a higher level of availability. This may encompass large-scale outages at the most inconvenient time, loss of customers and possible public relation concerns.
Increasing uptime can be both cost- and resource-prohibitive to many companies, so if that decision comes from business leaders, it changes the focus from an IT initiative to a business one, which can make all of the difference. It is no longer about what new tools and toys IT wants, but what the business needs to retain customers. How much availability is enough is no longer a simple question. While the quick answer was always, "as high as possible," it has become a balance of weighing the cost effectiveness versus the risks. This is a decision that needs to come from the business with the input of IT and not the other way around. Only by looking at availability from that perspective does it become possible to answer the question of how much availability is truly enough.