This content is part of the Essential Guide: Prepare and prevent: Virtual recovery and backup strategies
Get started Bring yourself up to speed with our introductory content.

Proactively prevent downtime by protecting VMs from a server fault

You may not be able to avoid a server fault, but you can mitigate its affect by distributing critical workloads across multiple physical servers.

What are the risks for workloads on virtual servers? Is reliability more important for virtual servers than for traditional physical server platforms?

Virtualization has vastly improved server utilization, allowing more workloads to run on fewer physical platforms. Although this has been a significant benefit to businesses, it has also created vulnerabilities that IT professionals must consider and address within the data center. Running more workloads on fewer hardware platforms carries additional risk for the enterprise because more workloads are impacted by hardware failures.

The principal benefit of server virtualization is improved resource usage; each physical server can run multiple virtual machines. This is fine under ideal circumstances, but the risk of a server fault or failure remains. Prior to virtualization, a server typically hosted a single application, meaning a server failure only affected that particular workload. When a virtualized server hosts five, 10, 15 or more virtual machines, a server fault can affect multiple workloads.

Workload recovery can take more time than administrators expect in a virtual environment. Consider that a virtual machine starts working when it's reloaded into memory, and that VM will demand a portion of computing and networking resources. This leaves fewer server resources and network bandwidth to restore subsequent VMs. A server with many VMs may experience significant downtime before all of the VMs are successfully restored and relaunched.

With the widespread use of virtualization, each physical server is now far more important to the enterprise because each is likely to be running several important applications. IT professionals must plan for server problems and contingencies. One strategy is to consider the workload distribution and stagger critical workloads across multiple physical servers. This prevents a single server fault from disrupting most (if not all) of the organization's critical applications.

Another important strategy involves failover and workload balancing. Rather than maximizing server use, administrators intentionally leave a portion of unused resources on each server so that VMs disrupted on one server can quickly be migrated to (or restarted on) another server. This allows the workloads of a troubled server to be moved to other servers while the afflicted system is serviced.

Over the long term, IT professionals want to prevent servers from failing in the first place. This normally involves selecting and upgrading server hardware designed and built with superior reliability components and features. For example, a business that chooses to increase its server consolidation level often acquires more powerful servers that have additional computing resources (e.g., CPU cores, memory, NIC ports and so on) along with high-availability features.

Dig Deeper on Disaster recovery, failover and high availability for virtual servers

Join the conversation

1 comment

Send me notifications when other members comment.

Please create a username to comment.

How do you deal with a server fault?