The line between test and production environments is very clear when we consider service-level agreements and the how workloads can tolerate downtime. So, it makes sense to keep test and production environments separate in the event of an issue, right? This traditional thought process continues to make sense for many companies using virtualization, but it may not be the best approach.
Separating test and production virtual machines (VMs) requires dual environments. However, maintaining that physical separation can be expensive. Cost isn't the only factor to consider. One of the main concerns is that an issue with a test server could affect a production server.
VMware and other hypervisors include controls to limit that exposure with resource pools, reservations and limits. These resource controls allow administrators to guarantee or restrict hardware resources, including CPU, memory, network and I/O bandwidth for VMs, depending on the
So if it's safe to mix test and production VMs on the same host, and it would save money, then why aren't more people doing it? Part of the reason is the traditional line of thinking and the fact that the limited cost savings doesn't seem to justify the merger. But, what if I told you that, contrary to the traditional perspective, separating test and production VMs is actually more risky for your environment?
When servers fail
Host servers can and will fail, that is part of life. No matter what we do, a hypervisor can still crash and hardware can still fail. Knowing and accepting that we will eventually have a problem helps us take a different look at our environment. If we have 10 production hosts and five test hosts, we are twice as likely to lose a production host to a crash or hardware failure, simply because there are twice as many production servers that could crash. If we had a choice, we would prefer to lose a less-critical test host, but crashes rarely occur when and where we would prefer.
What happens when we have a failure? Let's assume each server (both test and production) hosts 30 VMs for a total of 450 VMs. If we lose a production host, 30 VMs need to be restarted on the remaining hosts, which comes out to 3.3 VMs per host. The restart will take a little time, but the real problem is the number of production VMs that crashed. In this example, 10% of the total production servers will be offline.
Spreading risk by mixing test and production environments
However, if we combine both the test and production environments we can mitigate the damage of losing one production server. With a combined pool of 15 hosts, we still would want to average the same 30 VMs per host, but using Distributed Resource Scheduler (DRS) rules we can average 20 production and 10 test VMs per host. This reduction in production VMs per host means in the event of a host crash, we now lose 20 VMs rather than 30. This 33% reduction combined with good DRS rules give your company quite a bit of room to minimize the impact. By reducing the number of production VMs that crash and adding extra hosts to the cluster, your restart queue now goes from 3.3 production VMs per host to 1.4 production VMs per host.
The reduced failure and faster restart benefits are not the only advantages to mixing test and production. If you have a host running all production VMs, what happens if you start to run out of resources? Resource pools, reservations and limits help you manage the resources, but force you to decide which production servers are more important. Not all production workloads are created equal, but try telling that to application owners. With a combined production and test environment, you will have more room to squeeze resources from test servers in the event you need additional resources for production VMs. This additional cushion allows your environment to handle an outage or spike in resources without disruption.
Combining test and production environments definitely jumps a well-established line and pushes many outside their comfort zone. However, as VMware continues to push us out of our traditional environment into the software-defined data center where the rules are written in pencil and can be changed at a moment's notice, it pays to think outside of the traditional data center box.
This was first published in February 2014