One of the unfortunate side effects to virtual machine sprawl (or normal organizational growth) is that an organization...
may eventually outgrow its hypervisor cluster. When this happens, the administrative staff must carefully consider their options for dealing with the problem.
How did you outgrow the cluster?
One of the first questions you need to address is how your organization outgrew the cluster. Obviously outgrowing the cluster probably stemmed from creating an excessive number of VMs, but that isn’t what I mean. The thing that you need to consider is what part of the cluster was outgrown. In other words, did you run out of storage? Are the cluster nodes running the maximum number of VMs that your hardware will allow? Does your cluster contain the maximum number of cluster nodes? Your answer to these questions can point you in the direction of the best solution to your problem.
Suppose for a moment that the cluster cannot accommodate any more VMs because all of the physical memory is being used and there isn’t any room in the servers for additional memory. In this type of situation, the best solution is likely going to be to add nodes to the cluster. Of course if the cluster already consists of the maximum number of nodes then you are going to have to look for other solutions.
Can the cluster be cleaned up?
One option for dealing with the problem of outgrowing a cluster is to look for ways of recapturing wasted resources. In all likelihood there are probably some VMs that are no longer being used. Identifying and then the provisioning unneeded VMs will allow you to reclaim hardware resources, thereby extending the life of the cluster. Admittedly however, this can be a very painstaking process, especially for organizations with large numbers of poorly documented VMs. The good news is that working through this process can do more than simply helping you to reclaim hardware resources. It can also help the organization to establish better practices for documenting VMs in the future.
Considerations for building a new cluster
In some cases, there is simply no getting around the need for building a brand-new cluster. There comes a point when a cluster's hardware is being fully utilized, and there is no room left for future growth. In these types of situations, there are some factors you need to consider prior to building a second cluster.
One such consideration is how the cluster will be managed. In a Microsoft environment for example, a clustered Hyper-V deployment can be managed through the Failover Cluster Manager. However, this native tool does not scale well, so if you are planning to have multiple clusters you are going to be better off managing those clusters with a tool such as System Center Virtual Machine Manager.
Another point to consider is how cluster offloading will be accomplished. If you are building a brand-new cluster, then you will presumably want to move some of the workloads off the existing cluster and onto the new cluster. You must therefore consider which VMs should be migrated and whether you are going to be able to migrate the VMs without downtime.
What are your immediate needs?
One more point worth thinking about is your immediate capacity needs. As you are no doubt aware, building a brand-new cluster can be an expensive undertaking. As long as you are not planning to immediately create a large number of new VMs, you may be able to reduce the cost by splitting your existing cluster into two separate clusters rather than building a brand-new cluster.
If you are building the cluster from scratch, then you will need to purchase enough new server hardware to create at least the minimum number of nodes required by your cluster. Conversely, splitting an existing cluster into two smaller clusters may put you in a position in which both resulting clusters already have more than the minimum number of required nodes. Of course you will still have to increase capacity, so you can join new nodes to your two clusters on an as needed basis. Depending on the size and capacity of your existing hypervisor cluster, this approach may reduce the amount of hardware that you have to purchase. The down side is that splitting a cluster can be labor intensive and may result in downtime for some VMs.
Any time that you outgrow a hypervisor cluster, there are going to be some tough decisions that need to be made. Unfortunately, there is no course of action that is universally the best option in every situation. You must instead evaluate your organization's own needs, your existing hardware, and the underlying cause for outgrowing the cluster.
How many VMs per cluster is too many?
Potential Hyper-V clustering problems