Server uptime and hardware failure guide
A comprehensive collection of articles, videos and more, hand-picked by our editors
Security, risks and outages with your virtual environments include more than simply ensuring that you have proper...
backups, firewalls and passwords. These subjects are not simply about policies, procedures and management. These are very wide-open fields that encompass interlocking pieces that need to be identified and addressed. Protection for these systems (both physical and virtual) comes in two key stages: The initial design and the operational element.
System outages can range from the minor to the catastrophic. When organizations put systems in place it should be with an understanding that outages can and will occur. As much as we would like (and try) to avoid outages and failures, they are parts of life. While we can't avoid all outages, we can work to limit how often and for how long they occur. Often people refer to this as number of "nines" that a system may have. For example, a system can be online and available 99% or 99.9999% of the time. The key difference between the two is not simply adding more technology; it's the cost of adding each nine. This cost can range from thousands to hundreds of thousands of dollars depending on the environment, so it's important to understand what the nines really mean to your organization.
A system with 99% uptime initially sounds impressive, but when you run the numbers over the course of a year, you can see some faults.
|Percent of uptime||Corresponding downtime|
|99%||3.65 days per year|
|99.9%||8.76 hours per year|
|99.99%||52.56 minutes per year|
|99.999%||5.26 minutes per year|
|99.9999%||31.5 seconds per year|
When looking at the outage time, it's key to remember this is an unplanned outage. A downtime of 3.65 days over the course of a year sounds ideal if it occurs on the weekends when no one is in the office, but in reality it is more likely to occur during the busiest time when you can't have your systems go offline. Murphy's Law still plays a role even in modern times.
While six nines might be ideal, it may not be cost effective. There is no simple calculation to judge what it takes to go from two nines to six nines -- there are many variables that need to be considered. The process may involve duplicate servers, storage frames, networks, power distribution and even redundant data centers. With so many possible variables, where does an organization find their best fit and how does it fit into an overall virtual server security framework?
One of the aspects of virtual server security is keeping your systems available. If your systems are not available to your customers, whether due to a denial of service attack or a system outage, your customers will not see the difference. In today's environments, virtualization and consolidation have enabled organizations to become efficient and effective at addressing the needs in a rapid timeframe. The downside to this is that more systems are now depending on less hardware, making that hardware even more critical.
Designing for redundancy is a challenge and incorporating security into it is a methodology for your data center. This needs to cover several technology silos in the data center. Often times, virtual server security is placed on the back burner as the focus is normally on redundancy. Redundancy at one level is often dependent on many others and if all of the pieces are not in place you may not have the protection you think you do.
As an example, it's important to look at the power needs for your server infrastructure as something that needs to be addressed in multiple stages.
Dual power supplies for equipment are often a starting point protecting against hardware failure. Ideally, the power supplies are connected to separate power feeds from redundant uninterruptable power supplies and then generators. This type of redundant power infrastructure is key to helping your infrastructure reach the highest levels of availability. However, if your infrastructure still contains a single point of failure, such as a single generator, that becomes your security vulnerability.
Networking is the connection from your users to the applications and systems they need. Dual network connections for a server are a start, but this can fail if the connections terminate at the same network switch. Or, if they rely on different switches, both switches may be connected to the same power source. Even with proper redundancy at the infrastructure level, do both connections go through the same intrusion detection system (IDS) or firewall? A shared IDS or firewall system is a choke point and a possible vulnerability.
Even organizations that have redundant data centers may find critical pieces missing when they attempt to fail over. While many failover sites include common components, such as Active Directory or NTP servers, do they have IDS systems or properly updated firewalls? It might not be possible to fail over every system including auditing and logging servers. This limitation is often understood, but the risks still need to be documented to management.
Availability is a critical aspect of your security umbrella and is more than a set of configurations. Availability is about identifying your single points of failure and addressing them or -- worst case -- documenting them to management. Having layers of redundancy helps, but it can all break down when that single point of failure is exploited. Simply buying a server, software package or hardware device with more nines of availability is not a solution to redundancy or virtual server security.
A guide to virtualization security
How vulnerable is your hypervisor?
The hypervisor features that provide the most virtual server security