Problem solve Get help with specific problems with your technologies, process and projects.

Keep SAN infrastructure at 100% uptime -- or else

A SAN infrastructure is arguably the most important piece of your virtual environment. Priority one is to ensure that a SAN doesn't take your entire virtual infrastructure offline.

When I built weather satellites for the U.S. government, we used a framework called failure mode and effects analysis (FMEA). It analyzes a system's potential failure modes and calculates the anticipated consequences.

More SAN infrastructure resources
Fibre Channel vs. iSCSI SANs: Who cares?

Storage options for virtualization: What's best for your business?

Virtualization storage
tutorial: Storage for virtual environments

How storage virtualization works
FMEA is a complex undertaking, but it was absolutely necessary for a $10 billion satellite. A relatively minor failure in a forgotten subsystem could cause the entire satellite to crash into someone's backyard.

But how does this relate to virtualization and storage area network (SAN) infrastructures? Well, today's virtual environments place an exceptionally heavy responsibility on their centralized storage infrastructures. Live Migration and vMotion both require centralized SANs for virtual machines (VMs) to fail over and load-balance.

SANs: The linchpin of virtual infrastructure
Because of this requirement almost every virtual environment has to implement a SAN infrastructure, but it also increases the adverse effects and costs associated with a SAN failure.

To illustrate my point, draw out the interdependencies between each component in your virtual infrastructure. For each component, draw a line to another component on which it relies. Continue this exercise until you map out the entire dependency tree. (The end result is similar to an FMEA scenario.)

How much storage downtime can your virtual environment handle? Not much, if any.
Notice that all the arrows eventually point back to your SAN infrastructure. As a result, your SAN infrastructure uptime must be close to, if not at, 100%. Any storage downtime, especially extended downtime, creates a catastrophic failure of your entire virtual environment, because it forces every VM, server and application to go offline.

That's a terrible situation, and it's not easy to restart completely from scratch. In the event that these resources go offline, the interconnections between your servers and applications will likely require a specific startup procedure that is time-consuming.

Storage vendors recognize this fact. Last year, Hitachi announced 100% storage uptime with its Hitachi High Availability Manager. DataCore's storage virtualization software now advertises 100% uptime at one of its hosting partners. High-end solutions from EMC, Hewlett-Packard and Dell offer zero-downtime options or the assurance of zero downtime during certain SAN operations. Even software-based SAN vendor StarWind Software will create zero downtime with storage replication through an active/active, two-node storage cluster.

But you can achieve 100% storage availability through a combination of technologies and techniques. You need multiple levels of redundancy for SAN power, disk drives, storage connections, storage processors and even fully redundant storage nodes (e.g., HP's modular storage solutions). Adding storage replication to secondary, on-site and off-site SANs will further protect your data.

In the end, how much storage downtime can your virtual environment handle? The answer is not much, if any. Design multiple levels of redundancy, if you can afford it. Also, before a SAN infrastructure purchase, ask you vendor where the infrastructure's weak points are. A year down the road, you don't want a complete SAN failure to take down your entire computing infrastructure.

Greg Shields
Greg Shields is an independent author, instructor, Microsoft MVP and IT consultant based in Denver. He is a co-founder of Concentrated Technology LLC and has nearly 15 years of experience in IT architecture and enterprise administration. Shields specializes in Microsoft administration, systems management and monitoring, and virtualization. He is the author of several books, including Windows Server 2008: What's New/What's Changed , available from Sapien Press.

Dig Deeper on Downtime and data loss in virtualized environments

Start the conversation

Send me notifications when other members comment.

Please create a username to comment.