This article is part of an Essential Guide, our editor-selected collection of our best articles, videos and other content on this topic. Explore more in this guide:
1. - Designing a high availability and disaster avoidance plan: Read more in this section
- The case for OS-level high availability
- Balancing your approach to high availability
- Does resiliency dictate hypervisor performance?
Explore other sections in this guide:
- 2. - Virtual recovery and backup planning strategies
- 3. - Using snapshots as part of your virtual recovery and backup plan
Most enterprise virtualization platforms provide some sort of high-availability architecture for VMs, usually based on shared storage and spare hypervisor capacity. But do you really need this for applications that have their own built-in availability? Newer cloud-scale applications use a load balancer in front of a group of disposable VMs to provide a highly available service. In some cases, we can apply this approach to other enterprise applications.
Many applications have a scale-up, single-server architecture. Think about a relational database server; to make a higher-performing server, you give the server more CPU and RAM, along with faster disks and networks. The database is only available when that one powerful server is running. This high-availability architecture requires an infrastructure that keeps the database server up as much as possible and rapidly recovers from failure. However, there is a cost to this high-availability architecture. Usually the important VM must reside on expensive shared storage. We also need an enterprise-class hypervisor with features such as Microsoft Failover Clusters or VMware High Availability.
A more economical approach is a scale-out multiserver architecture, where the workload is shared across multiple servers. A classic example is a Web server farm. To improve website performance, you use more VMs and only care about having enough Web servers running to support the current load. Newer open source database servers have the same sort of architecture: Redis, Apache Cassandra and MongoDB all scale out for performance rather than requiring a single-node scale-up.
Many companies will find this approach difficult, since the new architecture means applications must be rewritten for the platform -- this is why most cloud applications we see were built from the ground up rather than migrated from an existing system.
The time and cost to rewrite these apps often means the VM stays on the expensive shared storage and clustered hypervisor. However, there are some parts of an environment that can benefit from a scale-out virtual platform that uses low-cost servers and local storage to allow horizontal scaling. For this approach to work, the workload must be made up of a large number of disposable VMs that can be automatically provisioned to handle peaks in load and disposed of when the peak passes. Web servers are the classic here, since they generally only have short-term connections from clients and usually store all of their data in a database server.
Another use case is a Citrix or Remote Desktop farm, where a group of VMs provide a scaled-out desktop service; even some types of virtual desktop workloads can scale this way. Any workload in which a load-balancing mechanism can distribute the load over a group of servers should work. Even a file server could work if the underlying file system distributed the replication task among a group of VMs. Since the individual VMs don't store any unique valuable data, the host doesn't need to be so resilient, and there is no need to store the VMs on shared storage. This could allow an organization to use a second- or third-tier hardware, or maybe servers without redundant fans or power supplies. You may also be able to run the workload on a free or lower-cost hypervisor. This should cut the hardware cost of running these VMs at least in half, maybe even more.
This isn't going to work for every part of an enterprise IT shop. Many servers will still need to store a single copy of important data, so they will need high availability, backups and extensive disaster recovery plans. Even the stateless Web and desktop tier will need a reliable data tier, database servers and file servers. With this approach, you can cut back on tier-one hardware and think about using tier-two services for more than the existing scale-out workloads. Maybe the same tier-two platform can be used to run the development systems that are currently on Amazon.